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Preface 



Welcome to Collaborative Statistics, presented by Connexions. The initial section below introduces you to 
Connexions. If you are familiar with Connexions, please skip to About "Collaborative Statistics." (Section : 
About Connexions) 

About Connexions 

Connexions Modular Content 

Connexions (cnx.org 2 ) is an online, open access educational resource dedicated to providing high quality 
learning materials free online, free in printable PDF format, and at low cost in bound volumes through 
print-on-demand publishing. The Collaborative Statistics textbook is one of many collections available 
to Connexions users. Each collection is composed of a number of re-usable learning modules written in 
the Connexions XML markup language. Each module may also be re-used (or 're-purposed') as part of 
other collections and may be used outside of Connexions. Including Collaborative Statistics, Connexions 
currently offers over 6500 modules and more than 350 collections. 

The modules of Collaborative Statistics are derived from the original paper version of the textbook under 
the same title, Collaborative Statistics. Each module represents a self-contained concept from the original 
work. Together, the modules comprise the original textbook. 

Re-use and Customization 

The Creative Commons (CC) Attribution license 3 applies to all Connexions modules. Under this license, 
any module in Connexions may be used or modified for any purpose as long as proper attribution to the 
original author(s) is maintained. Connexions' authoring tools make re-use (or re-purposing) easy. There- 
fore, instructors anywhere are permitted to create customized versions of the Collaborative Statistics text- 
book by editing modules, deleting unneeded modules, and adding their own supplementary modules. 
Connexions' authoring tools keep track of these changes and maintain the CC license's required attribution 
to the original authors. This process creates a new collection that can be viewed online, downloaded as a 
single PDF file, or ordered in any quantity by instructors and students as a low-cost printed textbook. To 
start building custom collections, please visit the help page, "Create a Collection with Existing Modules" 4 . 
For a guide to authoring modules, please look at the help page, "Create a Module in Minutes" 5 . 

Read the book online, print the PDF, or buy a copy of the book. 

To browse the Collaborative Statistics textbook online, visit the collection home page at 
cnx.org/content/coll0522/latest 6 . You will then have three options. 



lr rhis content is available online at <http://cnx.Org/content/ml6026/l.16/>. 

2 http://cnx.org/ 

3 http://creativecommons.org/licenses/by/2.0/ 

4 http://cnx.org/help/CreateCollection 

5 http://cnx.org/help/ModuleInMinutes 

6 Collaborative Statistics <http://cnx.org/content/coll0522/latest/> 



1. You may obtain a PDF of the entire textbook to print or view offline by clicking on the "Download 
PDF" link in the "Content Actions" box. 

2. You may order a bound copy of the collection by clicking on the "Order Printed Copy" button. 

3. You may view the collection modules online by clicking on the "Start 3>" link, which takes you to the 
first module in the collection. You can then navigate through the subsequent modules by using their 
"Next 3>" and "Previous 3>" links to move forward and backward in the collection. You can jump to 
any module in the collection by clicking on that module's title in the "Collection Contents" box on the 
left side of the window. If these contents are hidden, make them visible by clicking on "[show table 
of contents]". 

Accessibility and Section 508 Compliance 

• For information on general Connexions accessibility features, please visit 
http://cnx.org/content/ml7212/latest/ 7 . 

• For information on accessibility features specific to the Collaborative Statistics textbook, please visit 
http://cnx.org/content/ml7211/latest/ 8 . 

Version Change History and Errata 

• For a list of modifications, updates, and corrections, please visit 
http://cnx.org/content/ml7360/latest/ 9 . 

Adoption and Usage 

• The Collaborative Statistics collection has been adopted and customized by a number of profes- 
sors and educators for use in their classes. For a list of known versions and adopters, please visit 
http://cnx.org/content/ml8261/latest/ 10 . 

About "Collaborative Statistics" 

Collaborative Statistics was written by Barbara Illowsky and Susan Dean, faculty members at De Anza Col- 
lege in Cupertino, California. The textbook was developed over several years and has been used in regular 
and honors-level classroom settings and in distance learning classes. Courses using this textbook have been 
articulated by the University of California for transfer of credit. The textbook contains full materials for 
course offerings, including expository text, examples, labs, homework, and projects. A Teacher's Guide is 
currently available in print form and on the Connexions site at http://cnx.org/content/coll0547/latest/ , 
and supplemental course materials including additional problem sets and video lectures are available at 
http://cnx.org/content/coll0586/latest/ 12 . The on-line text for each of these collections collections will 
meet the Section 508 standards for accessibility. 

An on-line course based on the textbook was also developed by Illowsky and Dean. It has won an award 
as the best on-line California community college course. The on-line course will be available at a later date 
as a collection in Connexions, and each lesson in the on-line course will be linked to the on-line textbook 
chapter. The on-line course will include, in addition to expository text and examples, videos of course 
lectures in captioned and non-captioned format. 

The original preface to the book as written by professors Illowsky and Dean, now follows: 



7 "Accessibility Features of Connexions" <http://cnx.org/content/ml7212/latest/> 
8 "Collaborative Statistics: Accessibility" <http://cnx.org/content/ml7211/latest/> 
9 "Collaborative Statistics: Change History" <http://cnx.org/content/ml7360/latest/> 
10 "Collaborative Statistics: Adoption and Usage" <http://cnx.org/content/ml8261/latest/> 
11 Collaborative Statistics Teacher's Guide <http://cnx.org/content/coll0547/latest/> 
12 Collaborative Statistics: Supplemental Course Materials <http://cnx.org/content/coll0586/latest/> 



This book is intended for introductory statistics courses being taken by students at two- and four-year 
colleges who are majoring in fields other than math or engineering. Intermediate algebra is the only pre- 
requisite. The book focuses on applications of statistical knowledge rather than the theory behind it. The 
text is named Collaborative Statistics because students learn best by doing. In fact, they learn best by 
working in small groups. The old saying "two heads are better than one" truly applies here. 

Our emphasis in this text is on four main concepts: 



• 



thinking statistically 
incorporating technology 
working collaboratively 
writing thoughtfully 



These concepts are integral to our course. Students learn the best by actively participating, not by just 
watching and listening. Teaching should be highly interactive. Students need to be thoroughly engaged 
in the learning process in order to make sense of statistical concepts. Collaborative Statistics provides 
techniques for students to write across the curriculum, to collaborate with their peers, to think statistically, 
and to incorporate technology. 

This book takes students step by step. The text is interactive. Therefore, students can immediately apply 
what they read. Once students have completed the process of problem solving, they can tackle interesting 
and challenging problems relevant to today's world. The problems require the students to apply their 
newly found skills. In addition, technology (TI-83 graphing calculators are highlighted) is incorporated 
throughout the text and the problems, as well as in the special group activities and projects. The book also 
contains labs that use real data and practices that lead students step by step through the problem solving 
process. 

At De Anza, along with hundreds of other colleges across the country, the college audience involves a 
large number of ESL students as well as students from many disciplines. The ESL students, as well as 
the non-ESL students, have been especially appreciative of this text. They find it extremely readable and 
understandable. Collaborative Statistics has been used in classes that range from 20 to 120 students, and in 
regular, honor, and distance learning classes. 

Susan Dean 

Barbara Illowsky 
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Additional Resources Currently Available 

• Glossary (Glossary, p. 5) 

• View or Download This Textbook Online (View or Download This Textbook Online, p. 5) 

• Collaborative Statistics Teacher's Guide (Collaborative Statistics Teacher's Guide, p. 5) 

• Supplemental Materials (Supplemental Materials, p. 5) 

• Video Lectures (Video Lectures, p. 6) 

• Version History (Version History, p. 6) 

• Textbook Adoption and Usage (Textbook Adoption and Usage, p. 6) 

• Additional Technologies and Notes (Additional Technologies, p. 6) 

• Accessibility and Section 508 Compliance (Accessibility and Section 508 Compliance, p. 6) 

The following section describes some additional resources for learners and educators. These modules and 
collections are all available on the Connexions website (http://cnx.org/ 14 ) and can be viewed online, 
downloaded, printed, or ordered as appropriate. 

Glossary 

This module contains the entire glossary for the Collaborative Statistics textbook collection (coll0522) since 
its initial release on 15 July 2008. The glossary is located at http://cnx.org/content/ml6129/latest/ 15 . 

View or Download This Textbook Online 

The complete contents of this book are available at no cost on the Connexions website at 
http://cnx.org/content/coll0522/latest/ 16 . Anybody can view this content free of charge either as an 
online e-book or a downloadable PDF file. A low-cost printed version of this textbook is also available 
here 



17 



Collaborative Statistics Teacher's Guide 

A complementary Teacher's Guide for Collaborative statistics is available through Connexions at 
http://cnx.org/content/coll0547/latest/ 18 . The Teacher's Guide includes suggestions for presenting con- 
cepts found throughout the book as well as recommended homework assignments. A low-cost printed 
version of this textbook is also available here 19 . 

Supplemental Materials 

This companion to Collaborative Statistics provides a number of additional resources for use by students 
and instructors based on the award winning Elementary Statistics Sofia online course 20 , also by textbook 



13 This content is available online at <http://cnx.Org/content/ml8746/l.6/>. 

14 http://cnx.org/ 

15 "Collaborative Statistics: Glossary" <http://cnx.org/content/ml6129/latest/> 

16 Collaborative Statistics <http://cnx.org/content/coll0522/latest/> 

17 http://my.qoop.com/store/7064943342106149/7781159220340 

18 Collaborative Statistics Teacher's Guide <http://cnx.org/content/coll0547/latest/> 

19 http://my.qoop.com/store/7064943342106149/8791310589747 

20 http://sofia.fhda.edu/gallery/statistics/index.html 



authors Barbara Illowsky and Susan Dean. This content is designed to complement the textbook by provid- 
ing video tutorials, course management materials, and sample problem sets. The Supplemental Materials 
collection can be found at http://cnx.org/content/coll0586/latest/ 21 . 

Video Lectures 



Video Lecture 1 
Video Lecture 2 
Video Lecture 3 
Video Lecture 4 
Video Lecture 5 
Video Lecture 6 
Video Lecture 7 
Video Lecture 8 
Video Lecture 9 
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Descriptive Statistics 23 

Probability Topics 24 

Discrete Distributions 25 

Continuous Random Variables 26 

The Normal Distribution 2 

The Central Limit Theorem 28 

Confidence Intervals 29 

Hypothesis Testing with a Single Mean 30 

• Video Lecture 10: Hypothesis Testing with Two Means 31 

• Video Lecture 1 1 : The Chi-Square Distribution 32 

• Video Lecture 12: Linear Regression and Correlation 33 

Version History 

This module contains a listing of changes, updates, and corrections made to the Collaborative Statistics 
textbook collection (coll0522) since its initial release on 15 July 2008. The Version History is located at 
http : / / cnx. org / content / ml 7360 / latest / 34 . 

Textbook Adoption and Usage 

This module is designed to track the various derivations of the Collaborative Statistics textbook and its 
various companion resources, as well as keep track of educators who have adopted various versions for 
their courses. New adopters are encouraged to provide their contact information and describe how they 
will use this book for their courses. The goal is to provide a list that will allow educators using this book 
to collaborate, share ideas, and make suggestions for future development of this text. The Adoption and 
Usage module is located at http://cnx.org/content/ml8261/latest/ 35 . 

Additional Technologies 

In order to provide the most flexible learning resources possible, we invite collaboration from all instructors 
wishing to create customized versions of this content for use with other technologies. For instance, you may 
be interested in creating a set of instructions similar to this collection's calculator notes. If you would like to 
contribute to this collection, please use the contact the authors with any ideas or materials you have created. 

Accessibility and Section 508 Compliance 
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• For information on general Connexions accessibility features, please visit 
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Student Welcome Letter 



Dear Student: 

Have you heard others say, "You're taking statistics? That's the hardest course I ever took!" They say that, 
because they probably spent the entire course confused and struggling. They were probably lectured to 
and never had the chance to experience the subject. You will not have that problem. Let's find out why. 

There is a Chinese Proverb that describes our feelings about the field of statistics: 

I HEAR, AND I FORGET 

I SEE, AND I REMEMBER 

I DO, AND I UNDERSTAND 

Statistics is a "do" field. In order to learn it, you must "do" it. We have structured this book so that you will 
have hands-on experiences. They will enable you to truly understand the concepts instead of merely going 
through the requirements for the course. 

What makes this book different from other texts? First, we have eliminated the drudgery of tedious cal- 
culations. You might be using computers or graphing calculators so that you do not need to struggle with 
algebraic manipulations. Second, this course is taught as a collaborative activity. With others in your class, 
you will work toward the common goal of learning this material. 

Here are some hints for success in your class: 



• Work hard and work every night. 

• Form a study group and learn together. 

• Don't get discouraged - you can do it! 

• As you solve problems, ask yourself, "Does this answer make sense?" 

• Many statistics words have the same meaning as in everyday English. 

• Go to your teacher for help as soon as you need it. 

• Don't get behind. 

• Read the newspaper and ask yourself, "Does this article make sense?" 

• Draw pictures - they truly help! 

Good luck and don't give up! 

Sincerely, 

Susan Dean and Barbara Illowsky 

39 This content is available online at <http://cnx.Org/content/ml6305/l.5/>. 
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De Anza College 

21250 Stevens Creek Blvd. 

Cupertino, California 95014 



Chapter 1 

Sampling and Data 

1.1 Sampling and Data 1 
1.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Recognize and differentiate between key terms. 

• Apply various types of sampling methods to data collection. 

• Create and interpret frequency tables. 



1.1.2 Introduction 

You are probably asking yourself the question, "When and where will I use statistics?". If you read any 
newspaper or watch television, or use the Internet, you will see statistical information. There are statistics 
about crime, sports, education, politics, and real estate. Typically, when you read a newspaper article or 
watch a news program on television, you are given sample information. With this information, you may 
make a decision about the correctness of a statement, claim, or "fact." Statistical methods can help you make 
the "best educated guess." 

Since you will undoubtedly be given statistical information at some point in your life, you need to know 
some techniques to analyze the information thoughtfully. Think about buying a house or managing a 
budget. Think about your chosen profession. The fields of economics, business, psychology, education, 
biology, law, computer science, police science, and early childhood development require at least one course 
in statistics. 

Included in this chapter are the basic ideas and words of probability and statistics. You will soon under- 
stand that statistics and probability work together. You will also learn how data are gathered and what 
"good" data are. 

1.2 Statistics 2 

The science of statistics deals with the collection, analysis, interpretation, and presentation of data. We see 
and use data in our everyday lives. 



lr rhis content is available online at <http://cnx.Org/content/ml6008/l.9/>. 
2 This content is available online at <http://cnx.Org/content/ml6020/l.14/>. 
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14 CHAPTER 1. SAMPLING AND DATA 

1.2.1 Optional Collaborative Classroom Exercise 

In your classroom, try this exercise. Have class members write down the average time (in hours, to the 
nearest half-hour) they sleep per night. Your instructor will record the data. Then create a simple graph 
(called a dot plot) of the data. A dot plot consists of a number line and dots (or points) positioned above 
the number line. For example, consider the following data: 

5; 5.5; 6; 6; 6; 6.5; 6.5; 6.5; 6.5; 7; 7; 8; 8; 9 

The dot plot for this data would be as follows: 

Frequency of Average Time (in Hours) Spent Sleeping per Night 






















o 


o 















o 






















D 


o 






Figure 1.1 



Does your dot plot look the same as or different from the example? Why? If you did the same example in 
an English class with the same number of students, do you think the results would be the same? Why or 
why not? 

Where do your data appear to cluster? How could you interpret the clustering? 

The questions above ask you to analyze and interpret your data. With this example, you have begun your 
study of statistics. 

In this course, you will learn how to organize and summarize data. Organizing and summarizing data is 
called descriptive statistics. Two ways to summarize data are by graphing and by numbers (for example, 
finding an average). After you have studied probability and probability distributions, you will use formal 
methods for drawing conclusions from "good" data. The formal methods are called inferential statistics. 
Statistical inference uses probability to determine how confident we can be that the conclusions are correct. 

Effective interpretation of data (inference) is based on good procedures for producing data and thoughtful 
examination of the data. You will encounter what will seem to be too many mathematical formulas for 
interpreting data. The goal of statistics is not to perform numerous calculations using the formulas, but to 
gain an understanding of your data. The calculations can be done using a calculator or a computer. The 
understanding must come from you. If you can thoroughly grasp the basics of statistics, you can be more 
confident in the decisions you make in life. 
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1.3 Probability 3 

Probability is a mathematical tool used to study randomness. It deals with the chance (the likelihood) of 
an event occurring. For example, if you toss a fair coin 4 times, the outcomes may not be 2 heads and 2 
tails. However, if you toss the same coin 4,000 times, the outcomes will be close to half heads and half tails. 
The expected theoretical probability of heads in any one toss is i or 0.5. Even though the outcomes of a 
few repetitions are uncertain, there is a regular pattern of outcomes when there are many repetitions. After 
reading about the English statistician Karl Pearson who tossed a coin 24,000 times with a result of 12,012 
heads, one of the authors tossed a coin 2,000 times. The results were 996 heads. The fraction Jjj|j is equal 
to 0.498 which is very close to 0.5, the expected probability. 

The theory of probability began with the study of games of chance such as poker. Predictions take the form 
of probabilities. To predict the likelihood of an earthquake, of rain, or whether you will get an A in this 
course, we use probabilities. Doctors use probability to determine the chance of a vaccination causing the 
disease the vaccination is supposed to prevent. A stockbroker uses probability to determine the rate of 
return on a client's investments. You might use probability to decide to buy a lottery ticket or not. In your 
study of statistics, you will use the power of mathematics through probability calculations to analyze and 
interpret your data. 

1.4 Key Terms 4 

In statistics, we generally want to study a population. You can think of a population as an entire collection 
of persons, things, or objects under study. To study the larger population, we select a sample. The idea of 
sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to 
gain information about the population. Data are the result of sampling from a population. 

Because it takes a lot of time and money to examine an entire population, sampling is a very practical 
technique. If you wished to compute the overall grade point average at your school, it would make sense 
to select a sample of students who attend the school. The data collected from the sample would be the 
students' grade point averages. In presidential elections, opinion poll samples of 1,000 to 2,000 people are 
taken. The opinion poll is supposed to represent the views of the people in the entire country. Manu- 
facturers of canned carbonated drinks take samples to determine if a 16 ounce can contains 16 ounces of 
carbonated drink. 

From the sample data, we can calculate a statistic. A statistic is a number that is a property of the sample. 
For example, if we consider one math class to be a sample of the population of all math classes, then the 
average number of points earned by students in that one math class at the end of the term is an example of 
a statistic. The statistic is an estimate of a population parameter. A parameter is a number that is a property 
of the population. Since we considered all math classes to be the population, then the average number of 
points earned per student over all the math classes is an example of a parameter. 

One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. The 
accuracy really depends on how well the sample represents the population. The sample must contain the 
characteristics of the population in order to be a representative sample. We are interested in both the 
sample statistic and the population parameter in inferential statistics. In a later chapter, we will use the 
sample statistic to test the validity of the established population parameter. 

A variable, notated by capital letters like X and Y, is a characteristic of interest for each person or thing in 
a population. Variables may be numerical or categorical. Numerical variables take on values with equal 

3 This content is available online at <http://cnx.Org/content/ml6015/l.ll/>. 
4 This content is available online at <http://cnx.Org/content/ml6007/l.16/>. 
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units such as weight in pounds and time in hours. Categorical variables place the person or thing into a 
category. If we let X equal the number of points earned by one math student at the end of a term, then X 
is a numerical variable. If we let Y be a person's party affiliation, then examples of Y include Republican, 
Democrat, and Independent. Y is a categorical variable. We could do some math with values of X (calculate 
the average number of points earned, for example), but it makes no sense to do math with values of Y 
(calculating an average party affiliation makes no sense). 

Data are the actual values of the variable. They may be numbers or they may be words. Datum is a single 
value. 

Two words that come up often in statistics are mean and proportion. If you were to take three exams in 
your math classes and obtained scores of 86, 75, and 92, you calculate your mean score by adding the three 
exam scores and dividing by three (your mean score would be 84.3 to one decimal place). If, in your math 
class, there are 40 students and 22 are men and 18 are women, then the proportion of men students is 



22 
40 



and the proportion of women students is |§ . Mean and proportion are discussed in more detail in later 
chapters. 

NOTE: The words "mean" and "average" are often used interchangeably. The substitution of one 
word for the other is common practice. The technical term is "arithmetic mean" and "average" is 
technically a center location. However, in practice among non-statisticians, "average" is commonly 
accepted for "arithmetic mean." 

Example 1.1 

Define the key terms from the following study: We want to know the average amount of money 
first year college students spend at ABC College on school supplies that do not include books. We 
randomly survey 100 first year students at the college. Three of those students spent $150, $200, 
and $225, respectively. 

Solution 

The population is all first year students attending ABC College this term. 

The sample could be all students enrolled in one section of a beginning statistics course at ABC 
College (although this sample may not represent the entire population). 

The parameter is the average amount of money spent (excluding books) by first year college stu- 
dents at ABC College this term. 

The statistic is the average amount of money spent (excluding books) by first year college students 
in the sample. 

The variable could be the amount of money spent (excluding books) by one first year student. 
Let X = the amount of money spent (excluding books) by one first year student attending ABC 
College. 

The data are the dollar amounts spent by the first year students. Examples of the data are $150, 
$200, and $225. 



1.4.1 Optional Collaborative Classroom Exercise 

Do the following exercise collaboratively with up to four people per group. Find a population, a sample, 
the parameter, the statistic, a variable, and data for the following study: You want to determine the average 
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number of glasses of milk college students drink per day. Suppose yesterday, in your English class, you 
asked five students how many glasses of milk they drank the day before. The answers were 1,0, 1, 3, and 4 
glasses of milk. 

1.5 Data 5 

Data may come from a population or from a sample. Small letters like x or y generally are used to represent 
data values. Most data can be put into the following categories: 

• Qualitative 

• Quantitative 

Qualitative data are the result of categorizing or describing attributes of a population. Hair color, blood 
type, ethnic group, the car a person drives, and the street a person lives on are examples of qualitative data. 
Qualitative data are generally described by words or letters. For instance, hair color might be black, dark 
brown, light brown, blonde, gray, or red. Blood type might be AB+, O-, or B+. Researchers often prefer to 
use quantitative data over qualitative data because it lends itself more easily to mathematical analysis. For 
example, it does not make sense to find an average hair color or blood type. 

Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes 
of a population. Amount of money, pulse rate, weight, number of people living in your town, and the 
number of students who take statistics are examples of quantitative data. Quantitative data may be either 
discrete or continuous. 

All data that are the result of counting are called quantitative discrete data. These data take on only certain 
numerical values. If you count the number of phone calls you receive for each day of the week, you might 
get 0, 1, 2, 3, etc. 

All data that are the result of measuring are quantitative continuous data assuming that we can measure 
accurately. Measuring angles in radians might result in the numbers j, j ,j , n , ^ , etc. If you and your 
friends carry backpacks with books in them to school, the numbers of books in the backpacks are discrete 
data and the weights of the backpacks are continuous data. 

Example 1.2: Data Sample of Quantitative Discrete Data 

The data are the number of books students carry in their backpacks. You sample five students. 
Two students carry 3 books, one student carries 4 books, one student carries 2 books, and one 
student carries 1 book. The numbers of books (3, 4, 2, and 1) are the quantitative discrete data. 

Example 1.3: Data Sample of Quantitative Continuous Data 

The data are the weights of the backpacks with the books in it. You sample the same five students. 
The weights (in pounds) of their backpacks are 6.2, 7, 6.8, 9.1, 4.3. Notice that backpacks carrying 
three books can have different weights. Weights are quantitative continuous data because weights 
are measured. 

Example 1.4: Data Sample of Qualitative Data 

The data are the colors of backpacks. Again, you sample the same five students. One student has 
a red backpack, two students have black backpacks, one student has a green backpack, and one 
student has a gray backpack. The colors red, black, black, green, and gray are qualitative data. 

NOTE: You may collect data as numbers and report it categorically. For example, the quiz scores 
for each student are recorded throughout the term. At the end of the term, the quiz scores are 
reported as A, B, C, D, or F. 



5 This content is available online at <http://cnx.Org/content/ml6005/l.15/>. 
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Example 1.5 

Work collaboratively to determine the correct data type (quantitative or qualitative). Indicate 
whether quantitative data are continuous or discrete. Hint: Data that are discrete often start with 
the words "the number of." 

1 . The number of pairs of shoes you own. 

2. The type of car you drive. 

3. Where you go on vacation. 

4. The distance it is from your home to the nearest grocery store. 

5. The number of classes you take per school year. 

6. The tuition for your classes 

7. The type of calculator you use. 

8. Movie ratings. 

9. Political party preferences. 

10. Weight of sumo wrestlers. 

11. Amount of money (in dollars) won playing poker. 

12. Number of correct answers on a quiz. 

13. Peoples' attitudes toward the government. 

14. IQ scores. (This may cause some discussion.) 



1.6 Variation 6 

1.6.1 Variation in Data 

Variation is present in any set of data. For example, 16-ounce cans of beverage may contain more or less 
than 16 ounces of liquid. In one study, eight 16 ounce cans were measured and produced the following 
amount (in ounces) of beverage: 

15.8; 16.1; 15.2; 14.8; 15.8; 15.9; 16.0; 15.5 

Measurements of the amount of beverage in a 16-ounce can may vary because different people make the 
measurements or because the exact amount, 16 ounces of liquid, was not put into the cans. Manufacturers 
regularly run tests to determine if the amount of beverage in a 16-ounce can falls within the desired range. 

Be aware that as you take data, your data may vary somewhat from the data someone else is taking for the 
same purpose. This is completely natural. However, if two or more of you are taking the same data and 
get very different results, it is time for you and the others to reevaluate your data-taking methods and your 
accuracy. 

1.6.2 Variation in Samples 

It was mentioned previously that two or more samples from the same population, taken randomly, and 
having close to the same characteristics of the population are different from each other. Suppose Doreen and 
Jung both decide to study the average amount of time students at their college sleep each night. Doreen and 
Jung each take samples of 500 students. Doreen uses systematic sampling and Jung uses cluster sampling. 
Doreen's sample will be different from Jung's sample. Even if Doreen and Jung used the same sampling 
method, in all likelihood their samples would be different. Neither would be wrong, however. 



6 This content is available online at <http://cnx.Org/content/ml6021/l.15/>. 
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Think about what contributes to making Doreen's and Jung's samples different. 

If Doreen and Jung took larger samples (i.e. the number of data values is increased), their sample results 
(the average amount of time a student sleeps) might be closer to the actual population average. But still, 
their samples would be, in all likelihood, different from each other. This variability in samples cannot be 
stressed enough. 

1.6.2.1 Size of a Sample 

The size of a sample (often called the number of observations) is important. The examples you have seen 
in this book so far have been small. Samples of only a few hundred observations, or even smaller, are 
sufficient for many purposes. In polling, samples that are from 1200 to 1500 observations are considered 
large enough and good enough if the survey is random and is well done. You will learn why when you 
study confidence intervals. 



Be aware that many large samples are biased, 
because people choose to respond or not. 



For example, call-in surveys are invariable biased 



1.6.2.2 Optional Collaborative Classroom Exercise 

Exercise 1.6.1 

Divide into groups of two, three, or four. Your instructor will give each group one 6-sided die. 
Try this experiment twice. Roll one fair die (6-sided) 20 times. Record the number of ones, twos, 
threes, fours, fives, and sixes you get below ("frequency" is the number of times a particular face 
of the die occurs): 

First Experiment (20 rolls) 



Face on Die 


Frequency 


1 




2 




3 




4 




5 




6 





Table 1.1 
Second Experiment (20 rolls) 



Face on Die 


Frequency 


1 




2 




3 




4 




5 




6 
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Table 1.2 

Did the two experiments have the same results? Probably not. If you did the experiment a third 
time, do you expect the results to be identical to the first or second experiment? (Answer yes or 
no.) Why or why not? 

Which experiment had the correct results? They both did. The job of the statistician is to see 
through the variability and draw appropriate conclusions. 



1.6.3 Critical Evaluation 

We need to critically evaluate the statistical studies we read about and analyze before accepting the results 
of the study. Common problems to be aware of include 

• Problems with Samples: A sample should be representative of the population. A sample that is not 
representative of the population is biased. Biased samples that are not representative of the popula- 
tion give results that are inaccurate and not valid. 

• Self -Selected Samples: Responses only by people who choose to respond, such as call-in surveys are 
often unreliable. 

• Sample Size Issues: Samples that are too small may be unreliable. Larger samples are better if possible. 
In some situations, small samples are unavoidable and can still be used to draw conclusions, even 
though larger samples are better. Examples: Crash testing cars, medical testing for rare conditions. 

• Undue influence: Collecting data or asking questions in a way that influences the response. 

• Non -response or refusal of subject to participate: The collected responses may no longer be represen- 
tative of the population. Often, people with strong positive or negative opinions may answer surveys, 
which can affect the results. 

• Causality: A relationship between two variables does not mean that one causes the other to occur. 
They may both be related (correlated) because of their relationship through a different variable. 

• Self-Funded or Self-interest Studies: A study performed by a person or organization in order to sup- 
port their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not 
automatically assume that the study is good but do not automatically assume the study is bad either. 
Evaluate it on its merits and the work done. 

• Misleading Use of Data: Improperly displayed graphs, incomplete data, lack of context. 

• Confounding: When the effects of multiple factors on a response cannot be separated. Confounding 
makes it difficult or impossible to draw valid conclusions about the effect of each factor. 



1.7 Answers and Rounding Off 7 

A simple way to round off answers is to carry your final answer one more decimal place than was present 
in the original data. Round only the final answer. Do not round any intermediate results, if possible. If it 
becomes necessary to round intermediate results, carry them to at least twice as many decimal places as the 
final answer. For example, the average of the three quiz scores 4, 6, 9 is 6.3, rounded to the nearest tenth, 
because the data are whole numbers. Most answers will be rounded in this manner. 

It is not necessary to reduce most fractions in this course. Especially in Probability Topics (Section 4.1), the 
chapter on probability, it is more helpful to leave an answer as an unreduced fraction. 



7 This content is available online at <http://cnx.org/content/ml6006/!. 7/>. 
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1.8 Frequency 8 



Twenty students were asked how many hours they worked per day. Their responses, in hours, are listed 
below: 

5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3 

Below is a frequency table listing the different data values in ascending order and their frequencies. 

Frequency Table of Student Work Hours 



DATA VALUE 


FREQUENCY 


2 


3 


3 


5 


4 


3 


5 


6 


6 


2 


7 


1 



Table 1.3 

A frequency is the number of times a given datum occurs in a data set. According to the table above, 
there are three students who work 2 hours, five students who work 3 hours, etc. The total of the frequency 
column, 20, represents the total number of students included in the sample. 

A relative frequency is the fraction or proportion of times an answer occurs. To find the relative fre- 
quencies, divide each frequency by the total number of students in the sample - in this case, 20. Relative 
frequencies can be written as fractions, percents, or decimals. 

Frequency Table of Student Work Hours w/ Relative Frequency 



DATA VALUE 


FREQUENCY 


RELATIVE FREQUENCY 


2 


3 


J, or 0.15 


3 


5 


Jj or 0.25 


4 


3 


Jj or 0.15 


5 


6 


Jj or 0.30 


6 


2 


Jj or 0.10 


7 


1 


^ or 0.05 



Table 1.4 

The sum of the relative frequency column is |jj, or 1. 

Cumulative relative frequency is the accumulation of the previous relative frequencies. To find the cumu- 
lative relative frequencies, add all the previous relative frequencies to the relative frequency for the current 
row. 

Frequency Table of Student Work Hours w/ Relative and Cumulative Relative Frequency 



8 This content is available online at <http://cnx.Org/content/ml6012/l.19/>. 
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DATA VALUE 


FREQUENCY 


RELATIVE FRE- 
QUENCY 


CUMULATIVE RELA- 
TIVE FREQUENCY 


2 


3 


J, or 0.15 


0.15 


3 


5 


|j or 0.25 


0.15 + 0.25 = 0.40 


4 


3 


J, or 0.15 


0.40 + 0.15 = 0.55 


5 


6 


|j or 0.30 


0.55 + 0.30 = 0.85 


6 


2 


|j or 0.10 


0.85 + 0.10 = 0.95 


7 


1 


i or 0.05 


0.95 + 0.05 = 1.00 



Table 1.5 

The last entry of the cumulative relative frequency column is one, indicating that one hundred percent of 
the data has been accumulated. 

NOTE: Because of rounding, the relative frequency column may not always sum to one and the last 
entry in the cumulative relative frequency column may not be one. However, they each should be 
close to one. 

The following table represents the heights, in inches, of a sample of 100 male semiprofessional soccer play- 
ers. 

Frequency Table of Soccer Player Height 



HEIGHTS (INCHES) 


FREQUENCY 


RELATIVE FRE- 
QUENCY 


CUMULATIVE RELA- 
TIVE FREQUENCY 


59.95-61.95 


5 


4 = 0.05 


0.05 


61.95-63.95 


3 


TOO = °- 03 


0.05 + 0.03 = 0.08 


63.95 - 65.95 


15 


^=-=0 15 
100 u ^ 


0.08 + 0.15 = 0.23 


65.95 - 67.95 


40 


^2- -0 40 
100 u -^ u 


0.23 + 0.40 = 0.63 


67.95 - 69.95 


17 


— - 17 
100 u - iy 


0.63 + 0.17 = 0.80 


69.95-71.95 


12 


4=0.12 


0.80 + 0.12 = 0.92 


71.95-73.95 


7 


4=0.07 


0.92 + 0.07 = 0.99 


73.95 - 75.95 


1 


TOO = °- 01 


0.99 + 0.01 = 1.00 




Total = 100 


Total = 1.00 





Table 1.6 

The data in this table has been grouped into the following intervals: 



• 59.95 -61.95 inches 

• 61.95 -63.95 inches 

• 63.95 - 65.95 inches 

• 65.95 - 67.95 inches 

• 67.95 - 69.95 inches 

• 69.95 -71.95 inches 

• 71.95 -73.95 inches 

• 73.95 - 75.95 inches 
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NOTE: This example is used again in the Descriptive Statistics (Section 2.1) chapter, where the 
method used to compute the intervals will be explained. 

In this sample, there are 5 players whose heights are between 59.95 - 61.95 inches, 3 players whose heights 
fall within the interval 61.95 - 63.95 inches, 15 players whose heights fall within the interval 63.95 - 65.95 
inches, 40 players whose heights fall within the interval 65.95 - 67.95 inches, 17 players whose heights 
fall within the interval 67.95 - 69.95 inches, 12 players whose heights fall within the interval 69.95 - 71.95, 
7 players whose height falls within the interval 71.95 - 73.95, and 1 player whose height falls within the 
interval 73.95 - 75.95. All heights fall between the endpoints of an interval and not at the endpoints. 

Example 1.6 

From the table, find the percentage of heights that are less than 65.95 inches. 

Solution 

If you look at the first, second, and third rows, the heights are all less than 65.95 inches. There are 
5 + 3 + 15 = 23 males whose heights are less than 65.95 inches. The percentage of heights less than 
65.95 inches is then M, or 23%. This percentage is the cumulative relative frequency entry in the 
third row. 



Example 1.7 

From the table, find the percentage of heights that fall between 61.95 and 65.95 inches. 

Solution 

Add the relative frequencies in the second and third rows: 0.03 + 0.15 = 0.18 or 18%. 



Example 1.8 

Use the table of heights of the 100 male semiprofessional soccer players. Fill in the blanks and 
check your answers. 

1. The percentage of heights that are from 67.95 to 71.95 inches is: 

2. The percentage of heights that are from 67.95 to 73.95 inches is: 

3. The percentage of heights that are more than 65.95 inches is: 

4. The number of players in the sample who are between 61.95 and 71.95 inches tall is: 

5. What kind of data are the heights? 

6. Describe how you could gather this data (the heights) so that the data are characteristic of all 
male semiprofessional soccer players. 

Remember, you count frequencies. To find the relative frequency, divide the frequency by the 
total number of data values. To find the cumulative relative frequency, add all of the previous 
relative frequencies to the relative frequency for the current row. 



1.8.1 Optional Collaborative Classroom Exercise 

Exercise 1.8.1 

In your class, have someone conduct a survey of the number of siblings (brothers and sisters) each 
student has. Create a frequency table. Add to it a relative frequency column and a cumulative 
relative frequency column. Answer the following questions: 
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1. What percentage of the students in your class has siblings? 

2. What percentage of the students has from 1 to 3 siblings? 

3. What percentage of the students has fewer than 3 siblings? 

Example 1.9 

Nineteen people were asked how many miles, to the nearest mile they commute to work each 
day. The data are as follows: 

2; 5; 7; 3; 2; 10; 18; 15; 20; 7; 10; 18; 5; 12; 13; 12; 4; 5; 10 

The following table was produced: 

Frequency of Commuting Distances 



DATA 


FREQUENCY 


RELATIVE FREQUENCY 


CUMULATIVE RELATIVE FREQUENCY 


3 


3 


3 
19 


0.1579 


4 


1 


1 
19 


0.2105 


5 


3 


3 
19 


0.1579 


7 


2 


2 
19 


0.2632 


10 


3 


4 
19 


0.4737 


12 


2 


2 
19 


0.7895 


13 


1 


1 
19 


0.8421 


15 


1 


1 
19 


0.8948 


18 


1 


1 
19 


0.9474 


20 


1 


1 
19 


1.0000 



Table 1.7 



Problem 



(Solution on p. 42.) 



1 . Is the table correct? If it is not correct, what is wrong? 

2. True or False: Three percent of the people surveyed commute 3 miles. If the statement is not 
correct, what should it be? If the table is incorrect, make the corrections. 

3. What fraction of the people surveyed commute 5 or 7 miles? 

4. What fraction of the people surveyed commute 12 miles or more? Less than 12 miles? Be- 
tween 5 and 13 miles (does not include 5 and 13 miles)? 
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1.9 Summary 9 



Statistics 

• Deals with the collection, analysis, interpretation, and presentation of data 

Probability 

• Mathematical tool used to study randomness 

Key Terms 

• Population 

• Parameter 

• Sample 

• Statistic 

• Variable 

• Data 

Types of Data 

• Quantitative Data (a number) 

• Discrete (You count it.) 

• Continuous (You measure it.) 

• Qualitative Data (a category, words) 

Sampling 

• With Replacement: A member of the population may be chosen more than once 

• Without Replacement: A member of the population may be chosen only once 

Random Sampling 

• Each member of the population has an equal chance of being selected 

Sampling Methods 

• Random 

• Simple random sample 

• Stratified sample 

• Cluster sample 

• Systematic sample 

• Not Random 

• Convenience sample 

Frequency (freq. or f) 

• The number of times an answer occurs 

Relative Frequency (rel. freq. or RF) 

• The proportion of times an answer occurs 

• Can be interpreted as a fraction, decimal, or percent 

Cumulative Relative Frequencies (cum. rel. freq. or cum RF) 

• An accumulation of the previous relative frequencies 

9 This content is available online at <http://cnx.Org/content/ml6023/l.10/>. 
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1.10 Practice: Sampling and Data 10 
1.10.1 Student Learning Outcomes 

• The student will construct frequency tables. 

• The student will differentiate between key terms. 

• The student will compare sampling techniques. 



1.10.2 Given 

Studies are often done by pharmaceutical companies to determine the effectiveness of a treatment program. 
Suppose that a new AIDS antibody drug is currently under study. It is given to patients once the AIDS 
symptoms have revealed themselves. Of interest is the average length of time in months patients live once 
starting the treatment. Two researchers each follow a different set of 40 AIDS patients from the start of 
treatment until their deaths. The following data (in months) are collected. 

Researcher A 3; 4; 11; 15; 16; 17; 22; 44; 37; 16; 14; 24; 25; 15; 26; 27; 33; 29; 35; 44; 13; 21; 22; 10; 12; 8; 40; 32; 
26; 27; 31; 34; 29; 17; 8; 24; 18; 47; 33; 34 

Researcher B 3; 14; 11; 5; 16; 17; 28; 41; 31; 18; 14; 14; 26; 25; 21; 22; 31; 2; 35; 44; 23; 21; 21; 16; 12; 18; 41; 22; 
16; 25; 33; 34; 29; 13; 18; 24; 23; 42; 33; 29 

1.10.3 Organize the Data 

Complete the tables below using the data provided. 

Researcher A 



Survival Length (in 
months) 


Frequency 


Relative Frequency 


Cumulative Relative Fre- 
quency 


0.5 - 6.5 








6.5-12.5 








12.5 - 18.5 








18.5 - 24.5 








24.5 - 30.5 








30.5 - 36.5 








36.5-42.5 








42.5-48.5 









Table 1.8 
Researcher B 



Survival 
months) 


Length 


(in 


Frequency 


Relative Frequency 


Cumulative Relative Fre- 
quency 


continued on next page 



"This content is available online at <http://cnx.Org/content/ml6016/l.14/>. 
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0.5 - 6.5 








6.5 - 12.5 








12.5-18.5 








18.5 - 24.5 








24.5 - 30.5 








30.5 - 36.5 








36.5 - 42.5 








42.5-48.5 









Table 1.9 



1.10.4 Key Terms 

Define the key terms based upon the above example for Researcher A. 

Exercise 1.10.1 

Population 

Exercise 1.10.2 

Sample 

Exercise 1.10.3 

Parameter 

Exercise 1.10.4 

Statistic 

Exercise 1.10.5 

Variable 

Exercise 1.10.6 

Data 



1.10.5 Discussion Questions 

Discuss the following questions and then answer in complete sentences. 

Exercise 1.10.7 

List two reasons why the data may differ. 

Exercise 1.10.8 

Can you tell if one researcher is correct and the other one is incorrect? Why? 

Exercise 1.10.9 

Would you expect the data to be identical? Why or why not? 

Exercise 1.10.10 

How could the researchers gather random data? 

Exercise 1.10.11 

Suppose that the first researcher conducted his survey by randomly choosing one state in the 
nation and then randomly picking 40 patients from that state. What sampling method would that 
researcher have used? 
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Exercise 1.10.12 

Suppose that the second researcher conducted his survey by choosing 40 patients he knew. What 
sampling method would that researcher have used? What concerns would you have about this 
data set, based upon the data collection method? 
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1.11 Homework 



11 



Exercise 1.11.1 

For each item below: 



(Solution on p. 42.) 



i. Identify the type of data (quantitative - discrete, quantitative - continuous, or qualitative) that 

would be used to describe a response. 
ii. Give an example of the data. 

a. Number of tickets sold to a concert 

b. Amount of body fat 

c. Favorite baseball team 

d. Time in line to buy groceries 

e. Number of students enrolled at Evergreen Valley College 

f. Most-watched television show 

g. Brand of toothpaste 

h. Distance to the closest movie theatre 

i. Age of executives in Fortune 500 companies 

j. Number of competing computer spreadsheet software packages 

Exercise 1.11.2 

Fifty part-time students were asked how many courses they were taking this term. The (incom- 
plete) results are shown below: 

Part-time Student Course Loads 



# of Courses 


Frequency 


Relative Frequency 


Cumulative Relative 
Frequency 


1 


30 


0.6 




2 


15 






3 









Table 1.10 

a. Fill in the blanks in the table above. 

b. What percent of students take exactly two courses? 

c. What percent of students take one or two courses? 

Exercise 1.11.3 (Solution on p. 42.) 

Sixty adults with gum disease were asked the number of times per week they used to floss before 
their diagnoses. The (incomplete) results are shown below: 

Flossing Frequency for Adults with Gum Disease 



# Flossing per Week 


Frequency 


Relative Frequency 


Cumulative Relative Freq. 





27 


0.4500 




1 


18 






3 






0.9333 


6 


3 


0.0500 




7 


1 


0.0167 





^his content is available online at <http://cnx.Org/content/ml6010/l.18/>. 
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Table 1.11 

a. Fill in the blanks in the table above. 

b. What percent of adults flossed six times per week? 

c. What percent flossed at most three times per week? 

Exercise 1.11.4 

A fitness center is interested in the average amount of time a client exercises in the center each 
week. Define the following in terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.5 (Solution on p. 42.) 

Ski resorts are interested in the average age that children take their first ski and snowboard 
lessons. They need this information to optimally plan their ski classes. Define the following in 
terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.6 

A cardiologist is interested in the average recovery period for her patients who have had heart 
attacks. Define the following in terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.7 (Solution on p. 43.) 

Insurance companies are interested in the average health costs each year for their clients, so that 
they can determine the costs of health insurance. Define the following in terms of the study. Give 
examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 
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Exercise 1.11.8 

A politician is interested in the proportion of voters in his district that think he is doing a good 
job. Define the following in terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.9 (Solution on p. 43.) 

A marriage counselor is interested in the proportion the clients she counsels that stay married. 
Define the following in terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.10 

Political pollsters may be interested in the proportion of people that will vote for a particular 
cause. Define the following in terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.11 (Solution on p. 43.) 

A marketing company is interested in the proportion of people that will buy a particular product. 
Define the following in terms of the study. Give examples where appropriate. 

a. Population 

b. Sample 

c. Parameter 

d. Statistic 

e. Variable 

f. Data 

Exercise 1.11.12 

Airline companies are interested in the consistency of the number of babies on each flight, so that 
they have adequate safety equipment. Suppose an airline conducts a survey. Over Thanksgiving 
weekend, it surveys 6 flights from Boston to Salt Lake City to determine the number of babies on 
the flights. It determines the amount of safety equipment needed by the result of that study. 

a. Using complete sentences, list three things wrong with the way the survey was conducted. 

b. Using complete sentences, list three ways that you would improve the survey if it were to be 

repeated. 
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Exercise 1.11.13 

Suppose you want to determine the average number of students per statistics class in your state. 
Describe a possible sampling method in 3 - 5 complete sentences. Make the description detailed. 

Exercise 1.11.14 

Suppose you want to determine the average number of cans of soda drunk each month by persons 
in their twenties. Describe a possible sampling method in 3 - 5 complete sentences. Make the 
description detailed. 

Exercise 1.11.15 (Solution on p. 43.) 

771 distance learning students at Long Beach City College responded to surveys in the 2010- 
11 academic year. Highlights of the summary report are listed in the table below. (Source: 
http://de.lbcc.edU/reports/2010-ll/future/highlights.html#focus). 

LBCC Distance Learning Survey Results 



Have computer at home 


96% 


Unable to come to campus for classes 


65% 


Age 41 or over 


24% 


Would like LBCC to offer more DL courses 


95% 


Took DL classes due to a disability 


17% 


Live at least 16 miles from campus 


13% 


Took DL courses to fulfill transfer requirements 


71% 



Table 1.12 



a. What percent of the students surveyed do not have a computer at home? 

b. About how many students in the survey live at least 16 miles from campus? 

c. If the same survey was done at Great Basin College in Elko, Nevada, do you think the percent- 

ages would be the same? Why? 

Exercise 1.11.16 

Nineteen immigrants to the U.S were asked how many years, to the nearest year, they have lived 
in the U.S. The data are as follows: 



2; 5; 7; 2; 2; 10; 20; 15; 0; 7; 0; 20; 5; 12; 15; 12; 4; 5; 10 
The following table was produced: 
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Frequency of Immigrant Survey Responses 



Data 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 





2 


2 
19 


0.1053 


2 


3 


3 
19 


0.2632 


4 


1 


1 
19 


0.3158 


5 


3 


3 
19 


0.1579 


7 


2 


2 
19 


0.5789 


10 


2 


2 
19 


0.6842 


12 


2 


2 
19 


0.7895 


15 


1 


1 
19 


0.8421 


20 


1 


1 
19 


1.0000 



Table 1.13 



Also, explain how someone might have arrived at the incorrect 
47 percent of the people surveyed have lived in 



a. Fix the errors on the table 

number(s). 

b. Explain what is wrong with this statement: 

the U.S. for 5 years." 

c. Fix the statement above to make it correct. 

d. What fraction of the people surveyed have lived in the U.S. 5 or 7 years? 

e. What fraction of the people surveyed have lived in the U.S. at most 12 years? 

f. What fraction of the people surveyed have lived in the U.S. fewer than 12 years? 

g. What fraction of the people surveyed have lived in the U.S. from 5 to 20 years, inclusive? 

Exercise 1.11.17 

A "random survey" was conducted of 3274 people of the "microprocessor generation" (people 
born since 1971, the year the microprocessor was invented). It was reported that 48% of those 
individuals surveyed stated that if they had $2000 to spend, they would use it for computer 
equipment. Also, 66% of those surveyed considered themselves relatively savvy computer users. 
(Source: San Jose Mercury News) 

a. Do you consider the sample size large enough for a study of this type? Why or why not? 

b. Based on your "gut feeling," do you believe the percents accurately reflect the U.S. population 

for those individuals born since 1971? If not, do you think the percents of the population are 
actually higher or lower than the sample statistics? Why? 

Additional information: The survey was reported by Intel Corporation of individuals who visited 
the Los Angeles Convention Center to see the Smithsonian Institure's road show called "America's 
Smithsonian." 

c. With this additional information, do you feel that all demographic and ethnic groups were 

equally represented at the event? Why or why not? 

d. With the additional information, comment on how accurately you think the sample statistics 

reflect the population parameters. 

Exercise 1.11.18 



34 CHAPTER 1. SAMPLING AND DATA 

a. List some practical difficulties involved in getting accurate results from a telephone survey. 

b. List some practical difficulties involved in getting accurate results from a mailed survey. 

c. With your classmates, brainstorm some ways to overcome these problems if you needed to 

conduct a phone or mail survey. 



1.11.1 Try these multiple choice questions 

The next four questions refer to the following: A Lake Tahoe Community College instructor is interested 
in the average number of days Lake Tahoe Community College math students are absent from class during 
a quarter. 

Exercise 1.11.19 (Solution on p. 43.) 

What is the population she is interested in? 

A. All Lake Tahoe Community College students 

B. All Lake Tahoe Community College English students 

C. All Lake Tahoe Community College students in her classes 

D. All Lake Tahoe Community College math students 

Exercise 1.11.20 (Solution on p. 43.) 

Consider the following: 

X = number of days a Lake Tahoe Community College math student is absent 

In this case, X is an example of a: 

A. Variable 

B. Population 

C. Statistic 

D. Data 

Exercise 1.11.21 (Solution on p. 43.) 

The instructor takes her sample by gathering data on 5 randomly selected students from each 
Lake Tahoe Community College math class. The type of sampling she used is 

A. Cluster sampling 

B. Stratified sampling 

C. Simple random sampling 

D. Convenience sampling 

Exercise 1.11.22 (Solution on p. 43.) 

The instructor's sample produces an average number of days absent of 3.5 days. This value is an 
example of a 

A. Parameter 

B. Data 

C. Statistic 

D. Variable 

The next two questions refer to the following relative frequency table on hurricanes that have made direct 
hits on the U.S between 1851 and 2004. Hurricanes are given a strength category rating based on the 
minimum wind speed generated by the storm, {http://www.nhc.noaa.gov/gih/table5.gii 12 ) 



2 http://www.nhc.noaa.gov/gifs/table5.gif 
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Frequency of Hurricane Direct Hits 



Category 


Number of Direct Hits 


Relative Frequency 


Cumulative Frequency 


1 


109 


0.3993 


0.3993 


2 


72 


0.2637 


0.6630 


3 


71 


0.2601 




4 


18 




0.9890 


5 


3 


0.0110 


1.0000 




Total = 273 







Table 1.14 



(Solution on p. 43.) 



Exercise 1.11.23 

What is the relative frequency of direct hits that were category 4 hurricanes? 

A. 0.0768 

B. 0.0659 

C. 0.2601 

D. Not enough information to calculate 

Exercise 1.11.24 (Solution on p. 43.) 

What is the relative frequency of direct hits that were AT MOST a category 3 storm? 

A. 0.3480 

B. 0.9231 

C. 0.2601 

D. 0.3370 

The next three questions refer to the following: A study was done to determine the age, number of times 
per week and the duration (amount of time) of resident use of a local park in San Jose. The first house in 
the neighborhood around the park was selected randomly and then every 8th house in the neighborhood 
around the park was interviewed. 

Exercise 1.11.25 (Solution on p. 43.) 

"'Number of times per week"' is what type of data? 

A. qualitative 

B. quantitative - discrete 

C. quantitative - continuous 



Exercise 1.11.26 

The sampling method was: 

A. simple random 

B. systematic 

C. stratified 

D. cluster 



(Solution on p. 43.) 



Exercise 1.11.27 

"'Duration (amount of time)"' is what type of data? 



(Solution on p. 43.) 



36 CHAPTER 1. SAMPLING AND DATA 

A. qualitative 

B. quantitative - discrete 

C. quantitative - continuous 

Exercises 28 and 29 are not multiple choice exercises. 

Exercise 1.11.28 (Solution on p. 43.) 

Name the sampling method used in each of the following situations: 

A. A woman in the airport is handing out questionnaires to travelers asking them to evaluate the 

airport's service. She does not ask travelers who are hurrying through the airport with their 
hands full of luggage, but instead asks all travelers sitting near gates and who are not taking 
naps while they wait. 

B. A teacher wants to know if her students are doing homework so she randomly selects rows 2 

and 5, and then calls on all students in row 2 and all students in row 5 to present the solution 
to homework problems to the class. 

C. The marketing manager for an electronics chain store wants information about the ages of its 

customers. Over the next two weeks, at each store location, 100 randomly selected customers 
are given questionnaires to fill out which asks for information about age, as well as about 
other variables of interest. 

D. The librarian at a public library wants to determine what proportion of the library users are 

children. The librarian has a tally sheet on which she marks whether the books are checked 
out by an adult or a child. She records this data for every 4th patron who checks out books. 

E. A political party wants to know the reaction of voters to a debate between the candidates. The 

day after the debate, the party's polling staff calls 1200 randomly selected phone numbers. 
If a registered voter answers the phone or is available to come to the phone, that registered 
voter is asked who he/she intends to vote for and whether the debate changed his/her 
opinion of the candidates. 

** Contributed by Roberta Bloom 

Exercise 1.11.29 (Solution on p. 44.) 

Several online textbook retailers advertise that they have lower prices than on-campus book- 
stores. However, an important factor is whether the internet retailers actually have the textbooks 
that students need in stock. Students need to be able to get textbooks promptly at the beginning of 
the college term. If the book is not available, then a student would not be able to get the textbook 
at all, or might get a delayed delivery if the book is back ordered. 

A college newspaper reporter is investigating textbook availability at online retailers. He 
decides to investigate one textbook for each of the following 7 subjects: calculus, biology, 
chemistry, physics, statistics, geology, and general engineering. He consults textbook industry 
sales data and selects the most popular nationally used textbook in each of these subjects. He 
visits websites for a random sample of major online textbook sellers and looks up each of these 7 
textbooks to see if they are available in stock for quick delivery through these retailers. Based on 
his investigation, he writes an article in which he draws conclusions about the overall availability 
of all college textbooks through online textbook retailers. 

Write an analysis of his study that addresses the following issues: Is his sample representa- 
tive of the population of all college textbooks? Explain why or why not. Describe some possible 
sources of bias in this study, and how it might affect the results of the study. Give some sugges- 
tions about what could be done to improve the study. 

** Contributed by Roberta Bloom 
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1.12 Lab 1: Data Collection 13 

Class Time: 
Names: 

1.12.1 Student Learning Outcomes 

• The student will demonstrate the systematic sampling technique. 

• The student will construct Relative Frequency Tables. 

• The student will interpret results and their differences from different data groupings. 



1.12.2 Movie Survey 

Ask five classmates from a different class how many movies they saw last month at the theater. Do not 
include rented movies. 

1 . Record the data 

2. In class, randomly pick one person. On the class list, mark that person's name. Move down four 
people's names on the class list. Mark that person's name. Continue doing this until you have marked 
12 people's names. You may need to go back to the start of the list. For each marked name record 
below the five data values. You now have a total of 60 data values. 

3. For each name marked, record the data: 



Table 1.15 

1.12.3 Order the Data 

Complete the two relative frequency tables below using your class data. 



13 This content is available online at <http://cnx.Org/content/ml6004/l.ll/>. 
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Frequency of Number of Movies Viewed 



Number of Movies 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 











1 








2 








3 








4 








5 








6 








7+ 









Table 1.16 
Frequency of Number of Movies Viewed 



Number of Movies 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 


0-1 








2-3 








4-5 








6-7+ 









Table 1.17 



1. Using the tables, find the percent of data that is at most 2. Which table did you use and why? 

2. Using the tables, find the percent of data that is at most 3. Which table did you use and why? 

3. Using the tables, find the percent of data that is more than 2. Which table did you use and why? 

4. Using the tables, find the percent of data that is more than 3. Which table did you use and why? 



1.12.4 Discussion Questions 

1. Is one of the tables above "more correct" than the other? Why or why not? 

2. In general, why would someone group the data in different ways? Are there any advantages to either 
way of grouping the data? 

3. Why did you switch between tables, if you did, when answering the question above? 
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1.13 Lab 2: Sampling Experiment 

Class Time: 
Names: 



14 



1.13.1 Student Learning Outcomes 

• The student will demonstrate the simple random, systematic, stratified, and cluster sampling tech- 
niques. 

• The student will explain each of the details of each procedure used. 

In this lab, you will be asked to pick several random samples. In each case, describe your procedure briefly, 
including how you might have used the random number generator, and then list the restaurants in the 
sample you obtained 

NOTE: The following section contains restaurants stratified by city into columns and grouped 
horizontally by entree cost (clusters). 

1.13.2 A Simple Random Sample 

Pick a simple random sample of 15 restaurants. 

1. Describe the procedure: 

2 



1. 


6. 


11. 


2. 


7. 


12. 


3. 


8. 


13. 




4. 


9. 


14. 


5. 


10. 


15. 



Table 1.18 



1.13.3 A Systematic Sample 

Pick a systematic sample of 15 restaurants. 

1. Describe the procedure: 

2 



1. 


6. 


11. 


2. 


7. 


12. 


3. 


8. 


13. 




4. 


9. 


14. 


5. 


10. 


15. 



Table 1.19 



4 This content is available online at <http://cnx.Org/content/ml6013/l.15/>. 
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1.13.4 A Stratified Sample 

Pick a stratified sample, by city, of 20 restaurants. Use 25% of the restaurants from each stratum. Round to 
the nearest whole number. 

1. Describe the procedure: 

2. 



1. 


6. 


11. 


16. 


2. 


7. 


12. 


17. 


3. 


8. 


13. 


18. 




4. 


9. 


14. 


19. 


5. 


10. 


15. 


20. 



Table 1.20 



1.13.5 A Stratified Sample 

Pick a stratified sample, by entree cost, of 21 restaurants. Use 25% of the restaurants from each stratum. 
Round to the nearest whole number. 

1. Describe the procedure: 

2. 



1. 


6. 


11. 


16. 


2. 


7. 


12. 


17. 


3. 


8. 


13. 


18. 




4. 


9. 


14. 


19. 


5. 


10. 


15. 


20. 








21. 



Table 1.21 

1.13.6 A Cluster Sample 

Pick a cluster sample of restaurants from two cities. The number of restaurants will vary. 

1. Describe the procedure: 

2 



1. 


6. 


11. 


16. 


21. 


2. 


7. 


12. 


17. 


22. 


3. 


8. 


13. 


18. 


23. 




4. 


9. 


14. 


19. 


24. 


5. 


10. 


15. 


20. 


25. 



Table 1.22 



1.13.7 Restaurants Stratified by City and Entree Cost 

Restaurants Used in Sample 
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Entree Cost — > 


Under $10 


$10 to under $15 


$15 to under $20 


Over $20 


San Jose 


El Abuelo Taq, 
Pasta Mia, 
Emma's Express, 
Bamboo Hut 


Emperor's Guard, 
Creekside Inn 


Agenda, Gervais, 
Miro's 


Blake's, Eulipia, 
Hayes Mansion, 
Germania 


Palo Alto 


Senor Taco, Olive 
Garden, Taxi's 


Ming's, PA. Joe's, 
Stickney's 


Scott's Seafood, 
Poolside Grill, 
Fish Market 


Sundance Mine, 

Maddalena's, 

Spago's 


Los Gatos 


Mary's Patio, 
Mount Everest, 
Sweet Pea's, 
Andele Taqueria 


Lindsey's, Willow 
Street 


Toll House 


Charter House, La 
Maison Du Cafe 


Mountain View 


Maharaja, New 
Ma's, Thai-Rific, 
Garden Fresh 


Amber Indian, La 
Fiesta, Fiesta del 
Mar, Dawit 


Austin's, Shiva's, 
Mazeh 


Le Petit Bistro 


Cupertino 


Hobees, Hung Fu, 
Samrat, Panda Ex- 
press 


Santa Barb. Grill, 
Mand. Gourmet, 
Bombay Oven, 
Kathmandu West 


Fontana's, Blue 
Pheasant 


Hamasushi, He- 
lios 


Sunnyvale 


Chekijababi, Taj 
India, Full Throt- 
tle, Tia Juana, 
Lemon Grass 


Pacific Fresh, 
Charley Brown's, 
Cafe Cameroon, 
Faz, Aruba's 


Lion & Compass, 
The Palace, Beau 
Sejour 




Santa Clara 


Rangoli, Ar- 
madillo Willy's, 
Thai Pepper, 
Pasand 


Arthur's, Katie's 
Cafe, Pedro's, La 
Galleria 


Birk's, Truya 
Sushi, Valley 
Plaza 


Lakeside, Mari- 
ani's 



Table 1.23 



NOTE: The original lab was designed and contributed by Carol Olmstead. 
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Solutions to Exercises in Chapter 1 

Solution to Example 1.5, Problem (p. 18) 

Items 1, 5, 11, and 12 are quantitative discrete; items 4, 6, 10, and 14 are quantitative continuous; and items 
2, 3, 7, 8, 9, and 13 are qualitative. 
Solution to Example 1.8, Problem (p. 23) 

1. 29% 

2. 36% 

3. 77% 

4. 87 

5. quantitative continuous 

6. get rosters from each team and choose a simple random sample from each 

Solution to Example 1.9, Problem (p. 24) 

1. No. Frequency column sums to 18, not 19. Not all cumulative relative frequencies are correct. 

2. False. Frequency for 3 miles should be 1; for 2 miles (left out), 2. Cumulative relative frequency 
column should read: 0.1052, 0.1579, 0.2105, 0.3684, 0.4737, 0.6316, 0.7368, 0.7895, 0.8421, 0.9474, 1. 

1 5 



19 

7_ 12 7_ 

19' 19' 19 



Solutions to Homework 
Solution to Exercise 1.11.1 (p. 29) 

a. quantitative - discrete 

b. quantitative - continuous 

c. qualitative 

d. quantitative - continuous 

e. quantitative - discrete 

f. qualitative 

g. qualitative 

h. quantitative - continuous 
i. quantitative - continuous 
j. quantitative - discrete 

Solution to Exercise 1.11.3 (p. 29) 

a. Cum. Rel. Freq. for is 0.4500 

Rel. Freq. for 1 is 0.3000 and Cum. Rel. Freq. for 1 or less is 0.7500 
Freq. for 3 is 11 and Rel. Freq. is 0.1833 
Cum. Rel. Freq. for 6 or less is 0.9833 
Cum. Rel. Freq. for 7 or less is 1 

b. 5.00% 

c. 93.33% 

Solution to Exercise 1.11.5 (p. 30) 

a. Children who take ski or snowboard lessons 

b. A group of these children 

c. The population average 

d. The sample average 

e. X = the age of one child who takes the first ski or snowboard lesson 
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f . Values for X, such as 3, 7, etc. 
Solution to Exercise 1.11.7 (p. 30) 

a. The clients of the insurance companies 

b. A group of the clients 

c. The average health costs of the clients 

d. The average health costs of the sample 

e. X = the health costs of one client 

f . Values for X, such as 34, 9, 82, etc. 

Solution to Exercise 1.11.9 (p. 31) 

a. All the clients of the counselor 

b. A group of the clients 

c. The proportion of all her clients who stay married 

d. The proportion of the sample who stay married 

e. X = the number of couples who stay married 

f . yes, no 

Solution to Exercise 1.11.11 (p. 31) 

a. All people (maybe in a certain geographic area, such as the United States) 

b. A group of the people 

c. The proportion of all people who will buy the product 

d. The proportion of the sample who will buy the product 

e. X = the number of people who will buy it 

f . buy, not buy 

Solution to Exercise 1.11.15 (p. 32) 

a: 4% 
b: 100 

Solution to Exercise 1.11.19 (p. 34) 

D 
Solution to Exercise 1.11.20 (p. 34) 

A 
Solution to Exercise 1.11.21 (p. 34) 

B 
Solution to Exercise 1.11.22 (p. 34) 

C 
Solution to Exercise 1.11.23 (p. 35) 

B 
Solution to Exercise 1.11.24 (p. 35) 

B 
Solution to Exercise 1.11.25 (p. 35) 

B 
Solution to Exercise 1.11.26 (p. 35) 

B 
Solution to Exercise 1.11.27 (p. 35) 

C 
Solution to Exercise 1.11.28 (p. 36) 

A. Convenience 
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B. Cluster 

C. Stratified 

D. Systematic 

E. Simple Random 

Solution to Exercise 1.11.29 (p. 36) 

The answer below contains some of the issues that students might discuss for this problem. Individual 
student's answers may also identify other issues that pertain to this problem that are not included in the 
answer below. 

The sample is not representative of the population of all college textbooks. Two reasons why it is 
not representative are that he only sampled 7 subjects and he only investigated one textbook in each 
subject. There are several possible sources of bias in the study. The 7 subjects that he investigated are 
all in mathematics and the sciences; there are many subjects in the humanities, social sciences, and many 
other subject areas, (for example: literature, art, history, psychology, sociology, business) that he did not 
investigate at all. It may be that different subject areas exhibit different patterns of textbook availability, 
but his sample would not detect such results. 

He also only looked at the most popular textbook in each of the subjects he investigated. The avail- 
ability of the most popular textbooks may differ from the availability of other textbooks in one of two 

ways: 

• the most popular textbooks may be more readily available online, because more new copies are 
printed and more students nationwide selling back their used copies OR 

• the most popular textbooks may be harder to find available online, because more student demand 
exhausts the supply more quickly. 

In reality, many college students do not use the most popular textbook in their subject, and this study gives 
no useful information about the situation for those less popular textbooks. 

He could improve this study by 

• expanding the selection of subjects he investigates so that it is more representative of all subjects 
studied by college students and 

• expanding the selection of textbooks he investigates within each subject to include a mixed represen- 
tation of both the popular and less popular textbooks. 



Chapter 2 



Descriptive Statistics 

2.1 Descriptive Statistics 1 

2.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Display data graphically and interpret graphs: stemplots, histograms and boxplots. 

• Recognize, describe, and calculate the measures of location of data: quartiles and percentiles. 

• Recognize, describe, and calculate the measures of the center of data: mean, median, and mode. 

• Recognize, describe, and calculate the measures of the spread of data: variance, standard deviation, 
and range. 

2.1.2 Introduction 

Once you have collected data, what will you do with it? Data can be described and presented in many 
different formats. For example, suppose you are interested in buying a house in a particular area. You may 
have no clue about the house prices, so you might ask your real estate agent to give you a sample data set 
of prices. Looking at all the prices in the sample often is overwhelming. A better way might be to look 
at the median price and the variation of prices. The median and variation are just two ways that you will 
learn to describe data. Your agent might also provide you with a graph of the data. 

In this chapter, you will study numerical and graphical ways to describe and display your data. This area 
of statistics is called "Descriptive Statistics". You will learn to calculate, and even more importantly, to 
interpret these measurements and graphs. 

2.2 Displaying Data 2 

A statistical graph is a tool that helps you learn about the shape or distribution of a sample. The graph can 
be a more effective way of presenting data than a mass of numbers because we can see where data clusters 
and where there are only a few data values. Newspapers and the Internet use graphs to show trends and 
to enable readers to compare facts and figures quickly. 

Statisticians often graph data first to get a picture of the data. Then, more formal tools may be applied. 



lr rhis content is available online at <http://cnx.Org/content/ml6300/l.9/>. 
2 This content is available online at <http://cnx.Org/content/ml6297/l.9/>. 
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Some of the types of graphs that are used to summarize and organize data are the dot plot, the bar chart, 
the histogram, the stem-and-leaf plot, the frequency polygon (a type of broken line graph), pie charts, and 
the boxplot. In this chapter, we will briefly look at stem-and-leaf plots, line graphs and bar graphs. Our 
emphasis will be on histograms and boxplots. 

2.3 Stem and Leaf Graphs (Stemplots), Line Graphs and Bar Graphs 3 

One simple graph, the stem-and-leaf graph or stemplot, comes from the field of exploratory data analysis.lt 
is a good choice when the data sets are small. To create the plot, divide each observation of data into a stem 
and a leaf. The leaf consists of a final significant digit. For example, 23 has stem 2 and leaf 3. Four hundred 
thirty-two (432) has stem 43 and leaf 2. Five thousand four hundred thirty-two (5,432) has stem 543 and leaf 
2. The decimal 9.3 has stem 9 and leaf 3. Write the stems in a vertical line from smallest the largest. Draw a 
vertical line to the right of the stems. Then write the leaves in increasing order next to their corresponding 
stem. 

Example 2.1 

For Susan Dean's spring pre-calculus class, scores for the first exam were as follows (smallest to 
largest): 



33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; : 
96; 100 

Stem-and-Leaf Diagram 



3; 90; 92; 94; 94; 94; 94; 



Stem 


Leaf 


3 


3 


4 


299 


5 


355 


6 


1378899 


7 


2348 


8 


03888 


9 


0244446 


10 






Table 2.1 

The stemplot shows that most scores fell in the 60s, 70s, 80s, and 90s. Eight out of the 31 scores or 
approximately 26% of the scores were in the 90's or 100, a fairly high number of As. 

The stemplot is a quick way to graph and gives an exact picture of the data. You want to look for an overall 
pattern and any outliers. An outlier is an observation of data that does not fit the rest of the data. It is 
sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the 
graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500) while others may 
indicate that something unusual is happening. It takes some background information to explain outliers. 
In the example above, there were no outliers. 

Example 2.2 

Create a stem plot using the data: 



3 This content is available online at <http://cnx.Org/content/ml6849/l.15/>. 
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1.1; 1.5; 2.3; 2.5; 2.7; 3.2; 3.3; 3.3; 3.5; 3.8; 4.0; 4.2; 4.5; 4.5; 4.7; 4.8; 5.5; 5.6; 6.5; 6.7; 12.3 

The data are the distance (in kilometers) from a home to the nearest supermarket. 

Problem (Solution on p. 91.) 

1. Are there any outliers? 

2. Do the data seem to have any concentration of values? 

HINT: The leaves are to the right of the decimal. 



Another type of graph that is useful for specific data values is a line graph. In the particular line graph 
shown in the example, the x-axis consists of data values and the y-axis consists of frequency points. The 
frequency points are connected. 

Example 2.3 

In a survey, 40 mothers were asked how many times per week a teenager must be reminded to do 
his/her chores. The results are shown in the table and the line graph. 



Number of times teenager is reminded 


Frequency 





2 


1 


5 


2 


8 


3 


14 


4 


7 


5 


4 



Table 2.2 



16 - 
14 - 
12 - 
10 
Frequency 6 

6 - 
4 - 
2 * 

I 




12 3 4 5 6 

Number of Times Teenager is 
Reminded 



Bar graphs consist of bars that are separated from each other. The bars can be rectangles or they can be 
rectangular boxes and they can be vertical or horizontal. 

The bar graph shown in Example 4 has age groups represented on the x-axis and proportions on the y-axis. 
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Example 2.4 

By the end of 2011, in the United States, Facebook had over 146 million users. The table 
shows three age groups, the number of users in each age group and the proportion (%) of 
users in each age group. Source: http://www.kenburbary.com/2011/03/facebook-demographics- 
revisited-2011-statistics-2/ 



Age groups 


Number of Facebook users 


Proportion (%) of Facebook users 


13-25 


65,082,280 


45% 


26-44 


53,300,200 


36% 


45-64 


27,885,100 


19% 



Table 2.3 




Example 2.5 

The columns in the table below contain the race /ethnicity of U.S. Public Schools: High School 
Class of 2011, percentages for the Advanced Placement Examinee Population for that class 
and percentages for the Overall Student Population. The 3-dimensional graph shows the 
Race/Ethnicity of U.S. Public Schools (qualitative data) on the x-axis and Advanced Placement 
Examinee Population percentages on the y-axis. (Source: http://www.collegeboard.com and 
Source: http://apreport.collegeboard.org/goals-and-findings/promoting-equity) 



Race/Ethnicity 


AP Examinee Population 


Overall Student Population 


1 = Asian, Asian American or Pa- 
cific Islander 


10.3% 


5.7% 


continued on next page 
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2 = Black or African American 


9.0% 


14.7% 


3 = Hispanic or Latino 


17.0% 


17.6% 


4 = American Indian or Alaska 
Native 


0.6% 


1.1% 


5 = White 


57.1% 


59.2% 


6 = Not reported /other 


6.0% 


1.7% 



Table 2.4 



Ethnicity/Race vs. Percent of AP 
Examinees 



10.3 



17 

41 



Go to Outcomes of Education Figure 22 4 for an example of a bar graph that shows unemployment rates of 
persons 25 years and older for 2009. 

NOTE: This book contains instructions for constructing a histogram and a box plot for the TI-83+ 
and TI-84 calculators. You can find additional instructions for using these calculators on the Texas 
Instruments (TI) website 5 . 



2.4 Histograms 6 

For most of the work you do in this book, you will use a histogram to display the data. One advantage of a 
histogram is that it can readily display large data sets. A rule of thumb is to use a histogram when the data 
set consists of 100 values or more. 

A histogram consists of contiguous boxes. It has both a horizontal axis and a vertical axis. The horizontal 
axis is labeled with what the data represents (for instance, distance from your home to school). The vertical 
axis is labeled either Frequency or relative frequency. The graph will have the same shape with either 
label. The histogram (like the stemplot) can give you the shape of the data, the center, and the spread of the 
data. (The next section tells you how to calculate the center and the spread.) 



4 http://nces.ed.gov/pubs2011/2011015_5.pdf 

5 http:/ /education. ti.com/educationportal/sites/US/sectionHome/support.html 

6 This content is available online at <http://cnx.Org/content/ml6298/l.13/>. 
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The relative frequency is equal to the frequency for an observed value of the data divided by the total 
number of data values in the sample. (In the chapter on Sampling and Data (Section 1.1), we defined 
frequency as the number of times an answer occurs.) If: 



/ = frequency 

n = total number of data values (or the sum of the individual frequencies), and 

RF = relative frequency, 



then: 



RF = J - (2.1) 

n 

For example, if 3 students in Mr. Ahab's English class of 40 students received from 90% to 100%, then, 
/ = 3 , n = 40 , and RF = f - = -jjj = 0.075 

Seven and a half percent of the students received 90% to 100%. Ninety percent to 100 % are quantitative 
measures. 

To construct a histogram, first decide how many bars or intervals, also called classes, represent the data. 
Many histograms consist of from 5 to 15 bars or classes for clarity. Choose a starting point for the first 
interval to be less than the smallest data value. A convenient starting point is a lower value carried out 
to one more decimal place than the value with the most decimal places. For example, if the value with the 
most decimal places is 6.1 and this is the smallest value, a convenient starting point is 6.05 (6.1 - 0.05 = 6.05). 
We say that 6.05 has more precision. If the value with the most decimal places is 2.23 and the lowest value 
is 1.5, a convenient starting point is 1.495 (1.5 - 0.005 = 1.495). If the value with the most decimal places is 
3.234 and the lowest value is 1.0, a convenient starting point is 0.9995 (1.0 - .0005 = 0.9995). If all the data 
happen to be integers and the smallest value is 2, then a convenient starting point is 1.5 (2 - 0.5 = 1.5). Also, 
when the starting point and other boundaries are carried to one additional decimal place, no data value 
will fall on a boundary. 

Example 2.6 

The following data are the heights (in inches to the nearest half inch) of 100 male semiprofessional 
soccer players. The heights are continuous data since height is measured. 

60; 60.5; 61; 61; 61.5 

63.5; 63.5; 63.5 



64; 64; 

66; 66; 
67; 67; 

68; 68; 

70; 70; 
72; 72; 

74 



64; 

66; 
67; 

69; 

70; 

72; 



64; 64; 64; 64; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5 

66; 66; 66; 66; 66; 66; 66; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 67; 67; 
67; 67; 67; 67; 67; 67; 67; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5 

69; 69; 69; 69; 69; 69; 69; 69; 69; 69.5; 69.5; 69.5; 69.5; 69.5 

70; 70; 70; 70.5; 70.5; 70.5; 71; 71; 71 

72.5; 72.5; 73; 73.5 



The smallest data value is 60. Since the data with the most decimal places has one decimal (for 
instance, 61.5), we want our starting point to have two decimal places. Since the numbers 0.5, 
0.05, 0.005, etc. are convenient numbers, use 0.05 and subtract it from 60, the smallest value, for 
the convenient starting point. 
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60 - 0.05 = 59.95 which is more precise than, say, 61.5 by one decimal place. The starting point is, 
then, 59.95. 

The largest value is 74. 74+ 0.05 = 74.05 is the ending value. 

Next, calculate the width of each bar or class interval. To calculate this width, subtract the starting 
point from the ending value and divide by the number of bars (you must choose the number of 
bars you desire). Suppose you choose 8 bars. 

74.05 - 59.95 , n , 

o = !- 76 ( 2 - 2 ) 



NOTE: We will round up to 2 and make each bar or class interval 2 units wide. Rounding up to 2 is 
one way to prevent a value from falling on a boundary. Rounding to the next number is necessary 
even if it goes against the standard rules of rounding. For this example, using 1.76 as the width 
would also work. 

The boundaries are: 

59.95 

59.95 + 2 = 61.95 
61.95 + 2 = 63.95 
63.95 + 2 = 65.95 
65.95 + 2 = 67.95 
67.95 + 2 = 69.95 
69.95 + 2 = 71.95 
71.95 + 2 = 73.95 
73.95 + 2 = 75.95 

The heights 60 through 61.5 inches are in the interval 59.95 - 61.95. The heights that are 63.5 are 
in the interval 61.95 - 63.95. The heights that are 64 through 64.5 are in the interval 63.95 - 65.95. 
The heights 66 through 67.5 are in the interval 65.95 - 67.95. The heights 68 through 69.5 are in the 
interval 67.95 - 69.95. The heights 70 through 71 are in the interval 69.95 - 71.95. The heights 72 
through 73.5 are in the interval 71.95 - 73.95. The height 74 is in the interval 73.95 - 75.95. 

The following histogram displays the heights on the x-axis and relative frequency on the y-axis. 
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Relative 
Frequency 



0.4 


0.05 


0.03 


0.15 


0.4 


0.17 


0.12 


0.07 




0.35 






0.3 




0.25 




0.2 




0.15 








0.1 










0.05 


0.01 


















59.95 61.95 63.95 65.95 67.95 69.95 71.95 73.95 75.95 



Heights 



Example 2.7 

The following data are the number of books bought by 50 part-time college students at ABC 
College. The number of books is discrete data since books are counted. 

1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1 

2; 2; 2; 2; 2; 2; 2; 2; 2; 2 

4; 4; 4; 4; 4; 4 

6; 6 

Eleven students buy 1 book. Ten students buy 2 books. Sixteen students buy 3 books. Six students 
buy 4 books. Five students buy 5 books. Two students buy 6 books. 

Because the data are integers, subtract 0.5 from 1, the smallest data value and add 0.5 to 6, the 
largest data value. Then the starting point is 0.5 and the ending value is 6.5. 

Problem (Solution on p. 91.) 

Next, calculate the width of each bar or class interval. If the data are discrete and there are not too 
many different values, a width that places the data values in the middle of the bar or class interval 
is the most convenient. Since the data consist of the numbers 1, 2, 3, 4, 5, 6 and the starting point is 
0.5, a width of one places the 1 in the middle of the interval from 0.5 to 1.5, the 2 in the middle of 
the interval from 1.5 to 2.5, the 3 in the middle of the interval from 2.5 to 3.5, the 4 in the middle of 

the interval from to , the 5 in the middle of the interval from to , 

and the in the middle of the interval from to . 
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Calculate the number of bars as follows: 



6.5 - 0.5 

bars 



(2.3) 



where 1 is the width of a bar. Therefore, bars — 6. 

The following histogram displays the number of books on the x-axis and the frequency on the 
y-axis. 

16_ 
14_ 

12— 



r 

-L. 



0.5 



1.5 



2.5 3.5 

Number of Books 



4.5 



5.5 



6.5 



2.4.1 Optional Collaborative Exercise 

Count the money (bills and change) in your pocket or purse. Your instructor will record the amounts. As a 
class, construct a histogram displaying the data. Discuss how many intervals you think is appropriate. You 
may want to experiment with the number of intervals. Discuss, also, the shape of the histogram. 

Record the data, in dollars (for example, 1.25 dollars). 

Construct a histogram. 



2.5 Measures of the Center of the Data 7 

The "center" of a data set is also a way of describing location. The two most widely used measures of the 
"center" of the data are the mean (average) and the median. To calculate the mean weight of 50 people, 
add the 50 weights together and divide by 50. To find the median weight of the 50 people, order the data 
and find the number that splits the data into two equal parts (previously discussed under box plots in this 
chapter). The median is generally a better measure of the center when there are extreme values or outliers 
because it is not affected by the precise numerical values of the outliers. The mean is the most common 
measure of the center. 

NOTE: The words "mean" and "average" are often used interchangeably. The substitution of one 
word for the other is common practice. The technical term is "arithmetic mean" and "average" is 



7 This content is available online at <http://cnx.Org/content/ml7102/l.ll/>. 
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technically a center location. However, in practice among non-statisticians, "average" is commonly 
accepted for "arithmetic mean." 

The mean can also be calculated by multiplying each distinct value by its frequency and then dividing the 
sum by the total number of data values. The letter used to represent the sample mean is an x with a bar 
over it (pronounced "x bar"): x. 

The Greek letter \i (pronounced "mew") represents the population mean. One of the requirements for the 
sample mean to be a good estimate of the population mean is for the sample taken to be truly random. 

To see that both ways of calculating the mean are the same, consider the sample: 

1; 1; 1; 2; 2; 3; 4; 4; 4; 4; 4 

1+1+1+2+2+3+4+4+4+4+4 



11 



2.7 (2.4) 



g= 3x1+2x2+1x3+5x4 =Z7 ^ 

In the second example, the frequencies are 3, 2, 1, and 5. 

You can quickly find the location of the median by using the expression ^i! . 

The letter n is the total number of data values in the sample. If n is an odd number, the median is the middle 
value of the ordered data (ordered smallest to largest). If n is an even number, the median is equal to the 
two middle values added together and divided by 2 after the data has been ordered. For example, if the 
total number of data values is 97, then ^-^= 2^+1= 49. The median is the 49th value in the ordered data. 
If the total number of data values is 100, then ^-^= J~ = 50.5. The median occurs midway between the 
50th and 51st values. The location of the median and the value of the median are not the same. The upper 
case letter M is often used to represent the median. The next example illustrates the location of the median 
and the value of the median. 

Example 2.8 

AIDS data indicating the number of months an AIDS patient lives after taking a new antibody 
drug are as follows (smallest to largest): 

3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 
33; 33; 34; 34; 35; 37; 40; 44; 44; 47 

Calculate the mean and the median. 

Solution 

The calculation for the mean is: 

- _ [3+4+(8)(2)+10+ll+12+13+14+(15)(2) + (16)(2)+...+35+37+40+(44)(2)+47] _ 93 £ 

To find the median, M, first use the formula for the location. The location is: 
«+i = 40+1 = 20.5 

Starting at the smallest value, the median is located between the 20th and 21st values (the two 
24s): 

3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 
33; 33; 34; 34; 35; 37; 40; 44; 44; 47 
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24+24 
2 

The median is 24. 



M = ^ = 24 



Example 2.9 

Suppose that, in a small town of 50 people, one person earns $5,000,000 per year and the other 49 
each earn $30,000. Which is the better measure of the "center," the mean or the median? 

Solution 

Y — 5000000+49x30000 _ 129400 

M = 30000 

(There are 49 people who earn $30,000 and one person who earns $5,000,000.) 

The median is a better measure of the "center" than the mean because 49 of the values are 30,000 
and one is 5,000,000. The 5,000,000 is an outlier. The 30,000 gives us a better sense of the middle of 
the data. 



Another measure of the center is the mode. The mode is the most frequent value. If a data set has two 
values that occur the same number of times, then the set is bimodal. 

Example 2.10: Statistics exam scores for 20 students are as follows 

Statistics exam scores for 20 students are as follows: 

50 ; 53 ; 59 ; 59 ; 63 ; 63 ; 72 ; 72 ; 72 ; 72 ; 72 ; 76 ; 78 ; 81 ; 83 ; 84 ; 84 ; 84 ; 90 ; 93 

Problem 

Find the mode. 

Solution 

The most frequent score is 72, which occurs five times. Mode = 72. 



Example 2.11 

Five real estate exam scores are 430, 430, 480, 480, 495. The data set is bimodal because the scores 
430 and 480 each occur twice. 

When is the mode the best measure of the "center"? Consider a weight loss program that advertises 
an average weight loss of six pounds the first week of the program. The mode might indicate that 
most people lose two pounds the first week, making the program less appealing. 

NOTE: The mode can be calculated for qualitative data as well as for quantitative data. 

Statistical software will easily calculate the mean, the median, and the mode. Some graphing 
calculators can also make these calculations. In the real world, people make these calculations 
using software. 
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2.5.1 The Law of Large Numbers and the Mean 

The Law of Large Numbers says that if you take samples of larger and larger size from any population, 
then the mean x of the sample is very likely to get closer and closer to ji. This is discussed in more detail in 
The Central Limit Theorem. 

NOTE: The formula for the mean is located in the Summary of Formulas (Section 2.8) section 
course. 



2.5.2 Sampling Distributions and Statistic of a Sampling Distribution 

You can think of a sampling distribution as a relative frequency distribution with a great many samples. 
(See Sampling and Data for a review of relative frequency). Suppose thirty randomly selected students 
were asked the number of movies they watched the previous week. The results are in the relative frequency 
table shown below. 



# of movies 


Relative Frequency 





5/30 


1 


15/30 


2 


6/30 


3 


4/30 


4 


1/30 



Table 2.5 

If you let the number of samples get very large (say, 300 million or more), the relative frequency table 
becomes a relative frequency distribution. 

A statistic is a number calculated from a sample. Statistic examples include the mean, the median and the 
mode as well as others. The sample mean x is an example of a statistic which estimates the population 
mean }i. 



2.6 Skewness and the Mean, Median, and Mode 8 

Consider the following data set: 

4;5;6;6;6;7;7;7;7;7;7;8;8;8;9;10 

This data set produces the histogram shown below. Each interval has width one and each value is located 
in the middle of an interval. 



8 This content is available online at <http://cnx.Org/content/ml7104/l.9/>. 
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8 



10 



The histogram displays a symmetrical distribution of data. A distribution is symmetrical if a vertical line 
can be drawn at some point in the histogram such that the shape to the left and the right of the vertical 
line are mirror images of each other. The mean, the median, and the mode are each 7 for these data. In a 
perfectly symmetrical distribution, the mean and the median are the same. This example has one mode 
(unimodal) and the mode is the same as the mean and median. In a symmetrical distribution that has two 
modes (bimodal), the two modes would be different from the mean and median. 

The histogram for the data: 

4;5;6;6;6;7;7;7;7;8 

is not symmetrical. The right-hand side seems "chopped off" compared to the left side. The shape distribu- 
tion is called skewed to the left because it is pulled out to the left. 



8 



The mean is 6.3, the median is 6.5, and the mode is 7. Notice that the mean is less than the median and 
they are both less than the mode. The mean and the median both reflect the skewing but the mean more 

so. 

The histogram for the data: 
6;7;7;7;7;8;8;8;9;10 
is also not symmetrical. It is skewed to the right. 
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8 



10 



The mean is 7.7, the median is 7.5, and the mode is 7. Of the three statistics, the mean is the largest, while 
the mode is the smallest. Again, the mean reflects the skewing the most. 

To summarize, generally if the distribution of data is skewed to the left, the mean is less than the median, 
which is often less than the mode. If the distribution of data is skewed to the right, the mode is often less 
than the median, which is less than the mean. 

Skewness and symmetry become important when we discuss probability distributions in later chapters. 

2.7 Measures of the Spread of the Data 9 

An important characteristic of any set of data is the variation in the data. In some data sets, the data values 
are concentrated closely near the mean; in other data sets, the data values are more widely spread out from 
the mean. The most common measure of variation, or spread, is the standard deviation. 

The standard deviation is a number that measures how far data values are from their mean. 

The standard deviation 

• provides a numerical measure of the overall amount of variation in a data set 

• can be used to determine whether a particular data value is close to or far from the mean 

The standard deviation provides a measure of the overall variation in a data set 

The standard deviation is always positive or 0. The standard deviation is small when the data are all 
concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger when 
the data values are more spread out from the mean, exhibiting more variation. 

Suppose that we are studying waiting times at the checkout line for customers at supermarket A and 
supermarket B; the average wait time at both markets is 5 minutes. At market A, the standard deviation 
for the waiting time is 2 minutes; at market B the standard deviation for the waiting time is 4 minutes. 

Because market B has a higher standard deviation, we know that there is more variation in the wait- 
ing times at market B. Overall, wait times at market B are more spread out from the average; wait times at 
market A are more concentrated near the average. 



'This content is available online at <http://cnx.Org/content/ml7103/l.14/>. 
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The standard deviation can be used to determine whether a data value is close to or far from the mean. 

Suppose that Rosa and Binh both shop at Market A. Rosa waits for 7 minutes and Binh waits for 1 minute 
at the checkout counter. At market A, the mean wait time is 5 minutes and the standard deviation is 2 
minutes. The standard deviation can be used to determine whether a data value is close to or far from the 
mean. 

Rosa waits for 7 minutes: 

• 7 is 2 minutes longer than the average of 5; 2 minutes is equal to one standard deviation. 

• Rosa's wait time of 7 minutes is 2 minutes longer than the average of 5 minutes. 

• Rosa's wait time of 7 minutes is one standard deviation above the average of 5 minutes. 

Binh waits for 1 minute. 

• 1 is 4 minutes less than the average of 5; 4 minutes is equal to two standard deviations. 

• Binh's wait time of 1 minute is 4 minutes less than the average of 5 minutes. 

• Binh's wait time of 1 minute is two standard deviations below the average of 5 minutes. 

• A data value that is two standard deviations from the average is just on the borderline for what many 
statisticians would consider to be far from the average. Considering data to be far from the mean if it 
is more than 2 standard deviations away is more of an approximate "rule of thumb" than a rigid rule. 
In general, the shape of the distribution of the data affects how much of the data is further away than 
2 standard deviations. (We will learn more about this in later chapters.) 

The number line may help you understand standard deviation. If we were to put 5 and 7 on a number line, 
7 is to the right of 5. We say, then, that 7 is one standard deviation to the right of 5 because 

5 + (1) (2) = 7. 

If 1 were also part of the data set, then 1 is two standard deviations to the left of 5 because 

5 + (-2) (2) = 1. 



■ ■ ■■■■■■ 

1 234567 



• In general, a value = mean + (#ofSTDEV) (standard deviation) 

• where #ofSTDEVs = the number of standard deviations 

• 7 is one standard deviation more than the mean of 5 because: 7=5+(l)(2) 

• 1 is two standard deviations less than the mean of 5 because: l=5+(— 2)(2) 

The equation value = mean + (#ofSTDEVs) (standard deviation) can be expressed for a sample and for a 
population: 

• sample: x = x + (#ofSTDEV) (s) 

• Population: x = }i + (#ofSTDEV) (a) 

The lower case letter s represents the sample standard deviation and the Greek letter a (sigma, lower case) 
represents the population standard deviation. 
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The symbol x is the sample mean and the Greek symbol ji is the population mean. 

Calculating the Standard Deviation 

If x is a number, then the difference "x - mean" is called its deviation. In a data set, there are as many 
deviations as there are items in the data set. The deviations are used to calculate the standard deviation. 
If the numbers belong to a population, in symbols a deviation is x — fi . For sample data, in symbols a 
deviation is x— x . 

The procedure to calculate the standard deviation depends on whether the numbers are the entire popula- 
tion or are data from a sample. The calculations are similar, but not identical. Therefore the symbol used 
to represent the standard deviation depends on whether it is calculated from a population or a sample. 
The lower case letter s represents the sample standard deviation and the Greek letter a (sigma, lower case) 
represents the population standard deviation. If the sample has the same characteristics as the population, 
then s should be a good estimate of cr. 

To calculate the standard deviation, we need to calculate the variance first. The variance is an average of 
the squares of the deviations (the x— x values for a sample, or the x — \i values for a population). The 
symbol cr 2 represents the population variance; the population standard deviation a is the square root of 
the population variance. The symbol s 2 represents the sample variance; the sample standard deviation s is 
the square root of the sample variance. You can think of the standard deviation as a special average of the 
deviations. 

If the numbers come from a census of the entire population and not a sample, when we calculate the aver- 
age of the squared deviations to find the variance, we divide by N, the number of items in the population. 
If the data are from a sample rather than a population, when we calculate the average of the squared devi- 
ations, we divide by n-1, one less than the number of items in the sample. You can see that in the formulas 
below. 

Formulas for the Sample Standard Deviation 



IZ(x-x) 2 Y.f-(x-x) 2 

• For the sample standard deviation, the denominator is n-1, that is the sample size MINUS 1 . 



Formulas for the Population Standard Deviation 



• For the population standard deviation, the denominator is N, the number of items in the population. 

In these formulas, / represents the frequency with which a value appears. For example, if a value appears 
once, / is 1. If a value appears three times in the data set or population, / is 3. 

Sampling Variability of a Statistic 

The statistic of a sampling distribution was discussed in Descriptive Statistics: Measuring the Center of 
the Data. How much the statistic varies from one sample to another is known as the sampling variability of 
a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard 
error of the mean is an example of a standard error. It is a special standard deviation and is known as the 
standard deviation of the sampling distribution of the mean. You will cover the standard error of the mean 
in The Central Limit Theorem (not now). The notation for the standard error of the mean is -?= where cr is 

the standard deviation of the population and n is the size of the sample. 

NOTE: In practice, USE A CALCULATOR OR COMPUTER SOFTWARE TO CALCULATE 
THE STANDARD DEVIATION. If you are using a TI-83,83+,84+ calculator, you need to select 
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the appropriate standard deviation o~ x or s x from the summary statistics. We will concentrate on 
using and interpreting the information that the standard deviation gives us. However you should 
study the following step-by-step example to help you understand how the standard deviation 
measures variation from the mean. 

Example 2.12 

In a fifth grade class, the teacher was interested in the average age and the sample standard 
deviation of the ages of her students. The following data are the ages for a SAMPLE of n = 20 fifth 
grade students. The ages are rounded to the nearest half year: 

9 ; 9.5 ; 9.5 ; 10 ; 10 ; 10 ; 10 ; 10.5 ; 10.5 ; 10.5 ; 10.5 ; 11 ; 11 ; 11 ; 11 ; 11 ; 11 ; 11.5 ; 11.5 ; 11.5 

9 + 9.5 x 2 + 10 x 4 + 10.5 x 4 + 11 x 6 + 11.5 x 3 



20 
The average age is 10.53 years, rounded to 2 places. 



10.525 



(2.6) 



The variance may be calculated by using a table. Then the standard deviation is calculated by 
taking the square root of the variance. We will explain the parts of the table after calculating s. 



Data 


Freq. 


Deviations 


Deviations 2 


(Freq.)(Deviations 2 ) 


X 


/ 


(x — x) 


(x-x) 2 


(f)(x-x) 2 


9 


1 


9 - 10.525 = 


-1.525 


(-1.525) 2 = 2.325625 


1 x 2.325625 = 2.325625 


9.5 


2 


9.5 - 10.525 = 


= -1.025 


(-1.025) 2 = 1.050625 


2 x 1.050625 = 2.101250 


10 


4 


10 - 10.525 = 


-0.525 


(-0.525) 2 = 0.275625 


4 x .275625 = 1.1025 


10.5 


4 


10.5 - 10.525 


= -0.025 


(-0.025) 2 = 0.000625 


4 x .000625 = .0025 


11 


6 


11 - 10.525 = 


0.475 


(0.475) 2 = 0.225625 


6 x .225625 = 1.35375 


11.5 


3 


11.5 - 10.525 


= 0.975 


(0.975) 2 = 0.950625 


3 x .950625 = 2.851875 



Table 2.6 



The sample variance, s 2 , is equal to the sum of the last column (9.7375) divided by the total number 
of data values minus one (20 - 1): 

s 2 = |7^5 = 0.5125 

The sample standard deviation s is equal to the square root of the sample variance: 

s = V0.5125 = .0715891 Rounded to two decimal places, s = 0.72 

Typically, you do the calculation for the standard deviation on your calculator or computer. The 

intermediate results are not rounded. This is done for accuracy. 

Problem 1 

Verify the mean and standard deviation calculated above on your calculator or computer. 

Solution 

For the TI-83,83+,84+, enter data into the list editor. 

Put the data values in list LI and the frequencies in list L2. 

STAT CALC 1-VarStats LI, L2 

x=10.525 

Use Sx because this is sample data (not a population): Sx=. 715891 
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• For the following problems, recall that value = mean + (#ofSTDEVs)(standard deviation) 

• For a sample: x = x + (#ofSTDEVs)(s) 

• For a population: x = ji + (#ofSTDEVs)( a) 

• For this example, use x = x + (#ofSTDEVs)(s) because the data is from a sample 

Problem 2 

Find the value that is 1 standard deviation above the mean. Find (x + Is). 

Solution 

(x + Is) = 10.53 + (1) (0.72) = 11.25 



Problem 3 

Find the value that is two standard deviations below the mean. Find (x — 2s). 

Solution 

(x - 2s) = 10.53 - (2) (0.72) = 9.09 

Problem 4 

Find the values that are 1.5 standard deviations from (below and above) the mean. 

Solution 



(x - 1.5s) = 10.53 - (1.5) (0.72) = 9.45 
(x + 1.5s) = 10.53 + (1.5) (0.72) = 11.61 



Explanation of the standard deviation calculation shown in the table 

The deviations show how spread out the data are about the mean. The data value 11.5 is farther from the 
mean than is the data value 11. The deviations 0.97 and 0.47 indicate that. A positive deviation occurs when 
the data value is greater than the mean. A negative deviation occurs when the data value is less than the 
mean; the deviation is -1.525 for the data value 9. If you add the deviations, the sum is always zero. (For 
this example, there are n=20 deviations.) So you cannot simply add the deviations to get the spread of the 
data. By squaring the deviations, you make them positive numbers, and the sum will also be positive. The 
variance, then, is the average squared deviation. 

The variance is a squared measure and does not have the same units as the data. Taking the square root 
solves the problem. The standard deviation measures the spread in the same units as the data. 

Notice that instead of dividing by n=20, the calculation divided by n-l=20-l=19 because the data is a sam- 
ple. For the sample variance, we divide by the sample size minus one (n — 1). Why not divide by n? The 
answer has to do with the population variance. The sample variance is an estimate of the population vari- 
ance. Based on the theoretical mathematics that lies behind these calculations, dividing by (n — 1) gives a 
better estimate of the population variance. 

NOTE: Your concentration should be on what the standard deviation tells us about the data. The 
standard deviation is a number which measures how far the data are spread from the mean. Let a 
calculator or computer do the arithmetic. 
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The standard deviation, s or a, is either zero or larger than zero. When the standard deviation is 0, there is 
no spread; that is, the all the data values are equal to each other. The standard deviation is small when the 
data are all concentrated close to the mean, and is larger when the data values show more variation from 
the mean. When the standard deviation is a lot larger than zero, the data values are very spread out about 
the mean; outliers can make s or a very large. 

The standard deviation, when first presented, can seem unclear. By graphing your data, you can get a 
better "feel" for the deviations and the standard deviation. You will find that in symmetrical distributions, 
the standard deviation can be very helpful but in skewed distributions, the standard deviation may not be 
much help. The reason is that the two sides of a skewed distribution have different spreads. In a skewed 
distribution, it is better to look at the first quartile, the median, the third quartile, the smallest value, and 
the largest value. Because numbers can be confusing, always graph your data. 

NOTE: The formula for the standard deviation is at the end of the chapter. 

Example 2.13 

Use the following data (first exam scores) from Susan Dean's spring pre-calculus class: 

33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90; 92; 94; 94; 94; 94; 
96; 100 

a. Create a chart containing the data, frequencies, relative frequencies, and cumulative relative 

frequencies to three decimal places. 

b. Calculate the following to one decimal place using a TI-83+ or TI-84 calculator: 

i. The sample mean 

ii. The sample standard deviation 

iii. The median 

iv. The first quartile 

v. The third quartile 

vi. IQR 

c. Construct a box plot and a histogram on the same set of axes. Make comments about the box 

plot, the histogram, and the chart. 

Solution 
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a. 



Data 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 


33 


1 


0.032 


0.032 


42 


1 


0.032 


0.064 


49 


2 


0.065 


0.129 


53 


1 


0.032 


0.161 


55 


2 


0.065 


0.226 


61 


1 


0.032 


0.258 


63 


1 


0.032 


0.29 


67 


1 


0.032 


0.322 


68 


2 


0.065 


0.387 


69 


2 


0.065 


0.452 


72 


1 


0.032 


0.484 


73 


1 


0.032 


0.516 


74 


1 


0.032 


0.548 


78 


1 


0.032 


0.580 


80 


1 


0.032 


0.612 


83 


1 


0.032 


0.644 


88 


3 


0.097 


0.741 


90 


1 


0.032 


0.773 


92 


1 


0.032 


0.805 


94 


4 


0.129 


0.934 


96 


1 


0.032 


0.966 


100 


1 


0.032 


0.998 (Why isn't this value 1?) 



Table 2.7 



i. The sample mean = 73.5 

ii. The sample standard deviation = 17.9 

iii. The median = 73 

iv. The first quartile = 61 

v. The third quartile = 90 

vi. IQR = 90 - 61 = 29 
The x-axis goes from 32.5 to 100.5; y-axis goes from -2.4 to 15 for the histogram; number of 

intervals is 5 for the histogram so the width of an interval is (100.5 - 32.5) divided by 5 which 

is equal to 13.6. Endpoints of the intervals: starting point is 32.5, 32.5+13.6 = 46.1, 46.1+13.6 = 

59.7, 59.7+13.6 = 73.3, 73.3+13.6 = 86.9, 86.9+13.6 = 100.5 = the ending value; No data values 

fall on an interval boundary. 
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Figure 2.1 



The long left whisker in the box plot is reflected in the left side of the histogram. The spread of 
the exam scores in the lower 50% is greater (73 - 33 = 40) than the spread in the upper 50% (100 - 
73 = 27). The histogram, box plot, and chart all reflect this. There are a substantial number of A 
and B grades (80s, 90s, and 100). The histogram clearly shows this. The box plot shows us that the 
middle 50% of the exam scores (IQR = 29) are Ds, Cs, and Bs. The box plot also shows us that the 
lower 25% of the exam scores are Ds and Fs. 

Comparing Values from Different Data Sets 

The standard deviation is useful when comparing data values that come from different data sets. If the data 
sets have different means and standard deviations, it can be misleading to compare the data values directly. 

• For each data value, calculate how many standard deviations the value is away from its mean. 

• Use the formula: value = mean + (#ofSTDEVs)(standard deviation); solve for #ofSTDEVs. 

• ttofSTDEVs = standard deviation 

• Compare the results of this calculation. 



#ofSTDEVs is often called a "z-score"; we can use the symbol z. In symbols, the formulas become: 



Sample 


X = X + z s 


Z=2 =I 


Population 


x = ji + za 


-=^ 



Table 2.8 
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Example 2.14 

Two students, John and Ali, from different high schools, wanted to find out who had the highest 
G.P. A. when compared to his school. Which student had the highest G.P.A. when compared to his 
school? 



Student 


GPA 


School Mean GPA 


School Standard Deviation 


John 


2.85 


3.0 


0.7 


Ali 


77 


80 


10 



Table 2.9 

Solution 

For each student, determine how many standard deviations (#ofSTDEVs) his GPA is away from 
the average, for his school. Pay careful attention to signs when comparing and interpreting the 
answer. 



#ofSTDEVs = 



value— mean 
standard deviation 



;z = 



X — jl 



2.85-3.0 



For John, z = #ofSTDEVs — — ^y- 
For Ali, z = #ofSTDEVs = 7 - 2 ^ =- - 0.3 



0.21 



10 



John has the better G.P.A. when compared to his school because his G.P.A. is 0.21 standard 
deviations below his school's mean while Ali's G.P.A. is 0.3 standard deviations below his 
school's mean. 

John's z-score of —0.21 is higher than Ali's z-score of —0.3 . For GPA, higher values are 
better, so we conclude that John has the better GPA when compared to his school. 



The following lists give a few facts that provide a little more insight into what the standard deviation tells 
us about the distribution of the data. 

For ANY data set, no matter what the distribution of the data is: 

• At least 75% of the data is within 2 standard deviations of the mean. 

• At least 89% of the data is within 3 standard deviations of the mean. 

• At least 95% of the data is within 4 1/2 standard deviations of the mean. 

• This is known as Chebyshev's Rule. 

For data having a distribution that is MOUND-SHAPED and SYMMETRIC: 

• Approximately 68% of the data is within 1 standard deviation of the mean. 

• Approximately 95% of the data is within 2 standard deviations of the mean. 

• More than 99% of the data is within 3 standard deviations of the mean. 

• This is known as the Empirical Rule. 

• It is important to note that this rule only applies when the shape of the distribution of the data is 
mound-shaped and symmetric. We will learn more about this when studying the "Normal" or "Gaus- 
sian" probability distribution in later chapters. 

**With contributions from Roberta Bloom 



2.8 Summary of Formulas 

Commonly Used Symbols 

• The symbol E means to add or to find the sum. 

• n = the number of data values in a sample 

• N = the number of people, things, etc. in the population 

• x = the sample mean 

• s = the sample standard deviation 

• }i = the population mean 

• a = the population standard deviation 

• / = frequency 

• x = numerical value 

Commonly Used Expressions 
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x * f = A value multiplied by its respective frequency 

yj x = The sum of the values 

yj x * f = The sum of values multiplied by their respective frequencies 

(x — x) or (x — ji) = Deviations from the mean (how far a value is from the mean) 

(x — x) or (x — ]i) = Deviations squared 

/ (x — x) or / (x — fi) = The deviations squared and multiplied by their frequencies 



Mean Formulas: 








• 


x ■ 


= ^- or x = 


.£/•* 






• 


F 


= TT or V= 


Lf-x 

N 






Stan 


dai 

s - 

a 


d Deviatio 


n Forn 

or s = 
> 
- or c ; 


uilas: 




• 


_ /Z(x-xf 
V n-\ 


. /Zf-(x-x) 2 
V n-l 


• 


~ V N 


~ V N 


■P) 2 



Formulas Relating a Value, the Mean, and the Standard Deviation: 

• value = mean + (#ofSTDEVs) (standard deviation), where #ofSTDEVs = the number of standard devi- 
ations 

• x = x+ (#ofSTDEVs)(s) 

• x = ji + (#ofSTDEVs)(tr) 



"This content is available online at <http://cnx.org/content/ml6310/1.9/>. 
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2.9 Practice 1: Center of the Data 11 

2.9.1 Student Learning Outcomes 

• The student will calculate and interpret the center, spread, and location of the data. 

• The student will construct and interpret histograms an box plots. 



2.9.2 Given 

Sixty-five randomly selected car salespersons were asked the number of cars they generally sell in one 
week. Fourteen people answered that they generally sell three cars; nineteen generally sell four cars; twelve 
generally sell five cars; nine generally sell six cars; eleven generally sell seven cars. 

2.9.3 Complete the Table 



Data Value (# cars) 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 



























































Table 2.10 



(Solution on p. 91.) 
(Solution on p. 91.) 



2.9 A Discussion Questions 

Exercise 2.9.1 

What does the frequency column sum to? Why? 

Exercise 2.9.2 

What does the relative frequency column sum to? Why? 

Exercise 2.9.3 

What is the difference between relative frequency and frequency for each data value? 

Exercise 2.9.4 

What is the difference between cumulative relative frequency and relative frequency for each data 
value? 



2.9.5 Enter the Data 

Enter your data into your calculator or computer. 

n This content is available online at <http://cnx.Org/content/ml6312/l.12/>. 
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2.9.6 Construct a Histogram 

Determine appropriate minimum and maximum x and y values and the scaling. Sketch the histogram 
below. Label the horizontal and vertical axes with words. Include numerical scaling. 



2.9.7 Data Statistics 

Calculate the following values: 

Exercise 2.9.5 

Sample mean = x = 

Exercise 2.9.6 

Sample standard deviation = s x 

Exercise 2.9.7 

Sample size = n = 



(Solution on p. 91.) 



(Solution on p. 91.) 



(Solution on p. 91.) 



2.9.8 Calculations 

Use the table in section 2.11.3 to calculate the following values: 

Exercise 2.9.8 
Median = 

Exercise 2.9.9 

Mode = 

Exercise 2.9.10 

First quartile = 

Exercise 2.9.11 

Second quartile = median = 50th percentile = 

Exercise 2.9.12 

Third quartile = 

Exercise 2.9.13 

Interquartile range (IQR) = - = 

Exercise 2.9.14 

10th percentile = 

Exercise 2.9.15 

70th percentile = 



(Solution on p. 91.) 



(Solution on p. 91.) 



(Solution on p. 91.) 



(Solution on p. 91.) 



(Solution on p. 92.) 



(Solution on p. 92.) 



(Solution on p. 92.) 



(Solution on p. 92.) 
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Exercise 2.9.16 

Find the value that is 3 standard deviations: 

a. Above the mean 

b. Below the mean 



(Solution on p. 92.) 



2.9.9 Box Plot 

Construct a box plot below. Use a ruler to measure and scale accurately. 



2.9.10 Interpretation 

Looking at your box plot, does it appear that the data are concentrated together, spread out evenly, or 
concentrated in some areas, but not in others? How can you tell? 
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2.10 Practice 2: Spread of the Data 12 

2.10.1 Student Learning Outcomes 

• The student will calculate measures of the center of the data. 

• The student will calculate the spread of the data. 

2.10.2 Given 

The population parameters below describe the full-time equivalent number of students (FTES) each year 
at Lake Tahoe Community College from 1976-77 through 2004-2005. (Source: Graphically Speaking by Bill 
King, LTCC Institutional Research, December 2005). 

Use these values to answer the following questions: 

• ]i = 1000 FTES 

• Median - 1014 FTES 

• cr = 474 FTES 

• First quartile = 528.5 FTES 

• Third quartile = 1447.5 FTES 

• n = 29 years 



2.10.3 Calculate the Values 

Exercise 2.10.1 

A sample of 11 years is taken. About how many are expected to have a FTES 
Explain how you determined your answer. 

Exercise 2.10.2 

75% of all years have a FTES: 

a. At or below: 

b. At or above: 

Exercise 2.10.3 

The population standard deviation = 

Exercise 2.10.4 

What percent of the FTES were from 528.5 to 1447.5? How do you know? 

Exercise 2.10.5 

What is the IQR? What does the IQR represent? 

Exercise 2.10.6 

How many standard deviations away from the mean is the median? 



(Solution on p. 92.) 
of 1014 or above? 

(Solution on p. 92.) 



(Solution on p. 92.) 



(Solution on p. 92.) 



(Solution on p. 92.) 



(Solution on p. 92.) 



2 This content is available online at <http://cnx.Org/content/ml7105/l.ll/>. 
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2.11 Homework 



13 



Exercise 2.11.1 (Solution on p. 92.) 

Twenty-five randomly selected students were asked the number of movies they watched the pre- 
vious week. The results are as follows: 



# of movies 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 





5 






1 


9 






2 


6 






3 


4 






4 


1 







Table 2.11 



Find the sample mean x 
Find the sample standard deviation, s 
Construct a histogram of the data. 
Complete the columns of the chart. 
Find the first quartile. 
Find the median. 
Find the third quartile. 
Construct a box plot of the data. 

What percent of the students saw fewer than three movies? 
Find the 40th percentile. 
Find the 90th percentile. 
Construct a line graph of the data. 
. Construct a stem plot of the data. 



Exercise 2.11.2 

The median age for U.S. blacks currently is 30.9 years; for U.S. whites it is 42.3 
years. ((Source: http://www.usatoday.com/news/nation/story/2012-05-17/mmority-births- 
census/ '55029100/1)) 

a. Based upon this information, give two reasons why the black median age could be lower than 

the white median age. 

b. Does the lower median age for blacks necessarily mean that blacks die younger than whites? 

Why or why not? 

c. How might it be possible for blacks and whites to die at approximately the same age, but for 

the median age for whites to be higher? 

Exercise 2.11.3 (Solution on p. 93.) 

Forty randomly selected students were asked the number of pairs of sneakers they owned. Let X 
= the number of pairs of sneakers owned. The results are as follows: 



3 This content is available online at <http://cnx.Org/content/ml6801/l.24/>. 
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X 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 


1 


2 






2 


5 






3 


8 






4 


12 






5 


12 






7 


1 







Table 2.12 

a. Find the sample mean x 

b. Find the sample standard deviation, s 

c. Construct a histogram of the data. 

d. Complete the columns of the chart. 

e. Find the first quartile. 

f. Find the median. 

g. Find the third quartile. 

h. Construct a box plot of the data. 

i. What percent of the students owned at least five pairs? 

j. Find the 40th percentile. 

k. Find the 90th percentile. 

1. Construct a line graph of the data 

m. Construct a stem plot of the data 

Exercise 2.11.4 

600 adult Americans were asked by telephone poll, What do you think constitutes a middle-class 
income? The results are below. Also, include left endpoint, but not the right endpoint. (Source: 
Time magazine; survey by Yankelovich Partners, Inc.) 

NOTE: "Not sure" answers were omitted from the results. 



Salary ($) 


Relative Frequency 


< 20,000 


0.02 


20,000 - 25,000 


0.09 


25,000 - 30,000 


0.19 


30,000 - 40,000 


0.26 


40,000 - 50,000 


0.18 


50,000 - 75,000 


0.17 


75,000 - 99,999 


0.02 


100,000+ 


0.01 



Table 2.13 



a. What percent of the survey answered "not sure" ? 
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b. What percent think that middle-class is from $25,000 - $50,000 ? 

c. Construct a histogram of the data 

a. Should all bars have the same width, based on the data? Why or why not? 

b. How should the <20,000 and the 100,000+ intervals be handled? Why? 

d. Find the 40th and 80th percentiles 

e. Construct a bar graph of the data 

Exercise 2.11.5 (Solution on p. 93.) 

Following are the published weights (in pounds) of all of the team members of the San Francisco 
49ers from a previous year (Source: San Jose Mercury News) 

177; 205; 210; 210; 232; 205; 185; 185; 178; 210; 206; 212; 184; 174; 185; 242; 188; 212; 215; 247; 241; 
223; 220; 260; 245; 259; 278; 270; 280; 295; 275; 285; 290; 272; 273; 280; 285; 286; 200; 215; 185; 230; 
250; 241; 190; 260; 250; 302; 265; 290; 276; 228; 265 

a. Organize the data from smallest to largest value. 

b. Find the median. 

c. Find the first quartile. 

d. Find the third quartile. 

e. Construct a box plot of the data. 

f. The middle 50% of the weights are from to . 



g. If our population were all professional football players, would the above data be a sample of 

weights or the population of weights? Why? 
h. If our population were the San Francisco 49ers, would the above data be a sample of weights 

or the population of weights? Why? 
i. Assume the population was the San Francisco 49ers. Find: 

i. the population mean, \i. 
ii. the population standard deviation, c. 
iii. the weight that is 2 standard deviations below the mean. 

iv. When Steve Young, quarterback, played football, he weighed 205 pounds. How many 
standard deviations above or below the mean was he? 

j. That same year, the average weight for the Dallas Cowboys was 240.08 pounds with a standard 
deviation of 44.38 pounds. Emmit Smith weighed in at 209 pounds. With respect to his team, 
who was lighter, Smith or Young? How did you determine your answer? 

Exercise 2.11.6 

An elementary school class ran 1 mile in an average of 11 minutes with a standard deviation of 
3 minutes. Rachel, a student in the class, ran 1 mile in 8 minutes. A junior high school class ran 
1 mile in an average of 9 minutes, with a standard deviation of 2 minutes. Kenji, a student in the 
class, ran 1 mile in 8.5 minutes. A high school class ran 1 mile in an average of 7 minutes with a 
standard deviation of 4 minutes. Nedda, a student in the class, ran 1 mile in 8 minutes. 

a. Why is Kenji considered a better runner than Nedda, even though Nedda ran faster than he? 

b. Who is the fastest runner with respect to his or her class? Explain why. 

Exercise 2.11.7 

In a survey of 20 year olds in China, Germany and America, people were asked the number of 
foreign countries they had visited in their lifetime. The following box plots display the results. 
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China 



Germany 



America 







1 



8 



10 



11 



a. In complete sentences, describe what the shape of each box plot implies about the distribution 

of the data collected. 

b. Explain how it is possible that more Americans than Germans surveyed have been to over eight 

foreign countries. 

c. Compare the three box plots. What do they imply about the foreign travel of twenty year old 

residents of the three countries when compared to each other? 

Exercise 2.11.8 

One hundred teachers attended a seminar on mathematical problem solving. The attitudes of 
a representative sample of 12 of the teachers were measured before and after the seminar. A 
positive number for change in attitude indicates that a teacher's attitude toward math became 
more positive. The twelve change scores are as follows: 

3; 8; -1; 2; 0; 5; -3; 1; -1; 6; 5; -2 

a. What is the average change score? 

b. What is the standard deviation for this population? 

c. What is the median change score? 

d. Find the change score that is 2.2 standard deviations below the mean. 

Exercise 2.11.9 (Solution on p. 93.) 

Three students were applying to the same graduate school. They came from schools with different 
grading systems. Which student had the best G.P.A. when compared to his school? Explain how 
you determined your answer. 



Student 


G.P.A. 


School Ave. G.P.A. 


School Standard Deviation 


Thuy 


2.7 


3.2 


0.8 


Vichet 


87 


75 


20 


Kamala 


8.6 


8 


0.4 



Table 2.14 
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Exercise 2.11.10 

Given the following box plot: 




a. Which quarter has the smallest spread of data? What is that spread? 

b. Which quarter has the largest spread of data? What is that spread? 

c. Find the Inter Quartile Range (IQR). 

d. Are there more data in the interval 5 - 10 or in the interval 10 - 13? How do you know this? 

e. Which interval has the fewest data in it? How do you know this? 

I. 0-2 

II. 2-4 

III. 10-12 

IV. 12-13 



Exercise 2.11.11 

Given the following box plot: 




20 



100 



150 



a. Think of an example (in words) where the data might fit into the above box plot. In 2-5 sen- 

tences, write down the example. 

b. What does it mean to have the first and second quartiles so close together, while the second to 

fourth quartiles are far apart? 

Exercise 2.11.12 

Santa Clara County, CA, has approximately 27,873 Japanese- Americans. Their ages are as follows. 
(Source: West magazine) 



Age Group 


Percent of Community 


0-17 


18.9 


18-24 


8.0 


25-34 


22.8 


35-44 


15.0 


45-54 


13.1 


55-64 


11.9 


65+ 


10.3 



Table 2.15 



a. Construct a histogram of the Japanese-American community in Santa Clara County, CA. The 
bars will not be the same width for this example. Why not? 
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b. What percent of the community is under age 35? 

c. Which box plot most resembles the information above? 



24 



34 



53 



*100 



ii, 



18 



34 



45 



*100 



in. 



24 25 



54 



*100 



Exercise 2.11.13 

Suppose that three book publishers were interested in the number of fiction paperbacks adult 
consumers purchase per month. Each publisher conducted a survey. In the survey, each asked 
adult consumers the number of fiction paperbacks they had purchased the previous month. The 
results are below. 

Publisher A 



# of books 


Freq. 


Rel. Freq. 





10 




1 


12 




2 


16 




3 


12 




4 


8 




5 


6 




6 


2 




8 


2 





Table 2.16 
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Publisher B 



# of books 


Freq. 


Rel. Freq. 





18 




1 


24 




2 


24 




3 


22 




4 


15 




5 


10 




7 


5 




9 


1 





Table 2.17 
Publisher C 



# of books 


Freq. 


Rel. Freq. 


0-1 


20 




2-3 


35 




4-5 


12 




6-7 


2 




8-9 


1 





Table 2.18 



a. Find the relative frequencies for each survey. Write them in the charts. 

b. Using either a graphing calculator, computer, or by hand, use the frequency column to construct 

a histogram for each publisher's survey. For Publishers A and B, make bar widths of 1. For 
Publisher C, make bar widths of 2. 

c. In complete sentences, give two reasons why the graphs for Publishers A and B are not identical. 

d. Would you have expected the graph for Publisher C to look like the other two graphs? Why or 

why not? 

e. Make new histograms for Publisher A and Publisher B. This time, make bar widths of 2. 

f. Now, compare the graph for Publisher C to the new graphs for Publishers A and B. Are the 

graphs more similar or more different? Explain your answer. 

Exercise 2.11.14 

Often, cruise ships conduct all on-board transactions, with the exception of gambling, on a cash- 
less basis. At the end of the cruise, guests pay one bill that covers all on-board transactions. Sup- 
pose that 60 single travelers and 70 couples were surveyed as to their on-board bills for a seven-day 
cruise from Los Angeles to the Mexican Riviera. Below is a summary of the bills for each group. 
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Singles 



Amount($) 


Frequency 


Rel. Frequency 


51-100 


5 




101-150 


10 




151-200 


15 




201-250 


15 




251-300 


10 




301-350 


5 





Table 2.19 
Couples 



Amount($) 


Frequency 


Rel. Frequency 


100-150 


5 




201-250 


5 




251-300 


5 




301-350 


5 




351-400 


10 




401-450 


10 




451-500 


10 




501-550 


10 




551-600 


5 




601-650 


5 





Table 2.20 



a. Fill in the relative frequency for each group. 

b. Construct a histogram for the Singles group. Scale the x-axis by $50. widths. Use relative 

frequency on the y-axis. 

c. Construct a histogram for the Couples group. Scale the x-axis by $50. Use relative frequency on 

the y-axis. 

d. Compare the two graphs: 

i. List two similarities between the graphs. 
ii. List two differences between the graphs. 
iii. Overall, are the graphs more similar or different? 

e. Construct a new graph for the Couples by hand. Since each couple is paying for two indi- 

viduals, instead of scaling the x-axis by $50, scale it by $100. Use relative frequency on the 
y-axis. 

f. Compare the graph for the Singles with the new graph for the Couples: 

i. List two similarities between the graphs. 

ii. Overall, are the graphs more similar or different? 
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i. By scaling the Couples graph differently, how did it change the way you compared it to the 
Singles? 

j. Based on the graphs, do you think that individuals spend the same amount, more or less, as 
singles as they do person by person in a couple? Explain why in one or two complete sen- 
tences. 



Exercise 2.11.15 (Solution on p. 93.) 

Refer to the following histograms and box plot. Determine which of the following are true and 
which are false. Explain your solution to each part in complete sentences. 



c. 



a. The medians for all three graphs are the same. 

b. We cannot determine if any of the means for the three graphs is different. 

c. The standard deviation for (b) is larger than the standard deviation for (a). 

d. We cannot determine if any of the third quartiles for the three graphs is different. 



Exercise 2.11.16 

Refer to the following box plots. 
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Datal 



Data 2 







a. In complete sentences, explain why each statement is false. 

i. Data 1 has more data values above 2 than Data 2 has above 2. 

ii. The data sets cannot have the same mode. 

iii. For Data 1, there are more data values below 4 than there are above 4. 

b. For which group, Data 1 or Data 2, is the value of "7" more likely to be an outlier? Explain why 

in complete sentences 



Exercise 2.11.17 (Solution on p. 93.) 

In a recent issue of the IEEE Spectrum, 84 engineering conferences were announced. Four con- 
ferences lasted two days. Thirty-six lasted three days. Eighteen lasted four days. Nineteen lasted 
five days. Four lasted six days. One lasted seven days. One lasted eight days. One lasted nine 
days. Let X = the length (in days) of an engineering conference. 



Organize the data in a chart. 

Find the median, the first quartile, and the third quartile. 

Find the 65th percentile. 

Find the 10th percentile. 

Construct a box plot of the data. 

The middle 50% of the conferences last from days to 



days. 



Calculate the sample mean of days of engineering conferences. 
Calculate the sample standard deviation of days of engineering conferences. 
Find the mode. 
If you were planning an engineering conference, which would you choose as the length of the 

conference: mean; median; or mode? Explain why you made that choice. 
Give two reasons why you think that 3-5 days seem to be popular lengths of engineering 
conferences. 



Exercise 2.11.18 

A survey of enrollment at 35 community colleges across the United States yielded the following 
figures (source: Microsoft Bookshelf): 

6414; 1550; 2109; 9350; 21828; 4300; 5944; 5722; 2825; 2044; 5481; 5200; 5853; 2750; 10012; 6357; 
27000; 9414; 7681; 3200; 17500; 9200; 7380; 18314; 6557; 13713; 17768; 7493; 2771; 2861; 1263; 7285; 
28165; 5080; 11622 

a. Organize the data into a chart with five intervals of equal width. Label the two columns "En- 

rollment" and "Frequency." 

b. Construct a histogram of the data. 
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c. If you were to build a new community college, which piece of information would be more 

valuable: the mode or the average size? 

d. Calculate the sample average. 

e. Calculate the sample standard deviation. 

f. A school with an enrollment of 8000 would be how many standard deviations away from the 

mean? 

Exercise 2.11.19 (Solution on p. 94.) 

The median age of the U.S. population in 1980 was 30.0 years. In 1991, the median age was 33.1 
years. (Source: Bureau of the Census) 

a. What does it mean for the median age to rise? 

b. Give two reasons why the median age could rise. 

c. For the median age to rise, is the actual number of children less in 1991 than it was in 1980? 

Why or why not? 



Exercise 2.11.20 

A survey was conducted of 130 purchasers of new BMW 3 series cars, 130 purchasers of new 
BMW 5 series cars, and 130 purchasers of new BMW 7 series cars. In it, people were asked the age 
they were when they purchased their car. The following box plots display the results. 
BMW 3 series 



BMW 5 series 



BMW 7 series 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



75 



80 



a. In complete sentences, describe what the shape of each box plot implies about the distribution 

of the data collected for that car series. 

b. Which group is most likely to have an outlier? Explain how you determined that. 

c. Compare the three box plots. What do they imply about the age of purchasing a BMW from the 

series when compared to each other? 

d. Look at the BMW 5 series. Which quarter has the smallest spread of data? What is that spread? 

e. Look at the BMW 5 series. Which quarter has the largest spread of data? What is that spread? 

f. Look at the BMW 5 series. Estimate the Inter Quartile Range (IQR). 

g. Look at the BMW 5 series. Are there more data in the interval 31-38 or in the interval 45-55? 

How do you know this? 
h. Look at the BMW 5 series. Which interval has the fewest data in it? How do you know this? 

i. 31-35 
ii. 38-41 
iii. 41-64 
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Exercise 2.11.21 (Solution on p. 94.) 

The following box plot shows the U.S. population for 1990, the latest available year. (Source: 
Bureau of the Census, 1990 Census) 



17 



33 



50 



=105 



a. Are there fewer or more children (age 17 and under) than senior citizens (age 65 and over)? 

How do you know? 

b. 12.6% are age 65 and over. Approximately what percent of the population are of working age 

adults (above age 17 to age 65)? 

Exercise 2.11.22 

Javier and Ercilia are supervisors at a shopping mall. Each was given the task of estimating the 
mean distance that shoppers live from the mall. They each randomly surveyed 100 shoppers. The 
samples yielded the following information: 





Javier 


Ercilia 


X 


6.0 miles 


6.0 miles 


s 


4.0 miles 


7.0 miles 



Table 2.21 



a. How can you determine which survey was correct ? 

b. Explain what the difference in the results of the surveys implies about the data. 

c. If the two histograms depict the distribution of values for each supervisor, which one depicts 

Ercilia's sample? How do you know? 



ii. 



Figure 2.2 



d. If the two box plots depict the distribution of values for each supervisor, which one depicts 
Ercilia's sample? How do you know? 
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1 



14 



21 



12 



Figure 2.3 



Exercise 2.11.23 

Student grades on a chemistry exam were: 

77, 78, 76, 81, 86, 51, 79, 82, 84, 99 



(Solution on p. 94.) 



a. Construct a stem-and-leaf plot of the data. 

b. Are there any potential outliers? If so, which scores are they? Why do you consider them 

outliers? 



2.11.1 Try these multiple choice questions (Exercises 24 - 30). 

The next three questions refer to the following information. We are interested in the number of years 
students in a particular elementary statistics class have lived in California. The information in the following 
table is from the entire section. 



Number of years 


Frequency 


7 


1 


14 


3 


15 


1 


18 


1 


19 


4 


20 


3 


22 


1 


23 


1 


26 


1 


40 


2 


42 


2 




Total = 20 



Table 2.22 



Exercise 2.11.24 

What is the IQR? 



(Solution on p. 94.) 



A. 8 
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B. 


11 


C. 


15 


D. 


35 


Exercise 2.11.25 


What is the mode? 


A. 


19 


B. 


19.5 


C. 


14 and 20 


D. 


22.65 



(Solution on p. 94.) 



Exercise 2.11.26 

Is this a sample or the entire population? 

A. sample 

B. entire population 

C. neither 



(Solution on p. 94.) 



The next two questions refer to the following table. X = the number of days per week that 100 clients use 
a particular exercise facility. 



X 


Frequency 





3 


1 


12 


2 


33 


3 


28 


4 


11 


5 


9 


6 


4 



Table 2.23 



Exercise 2.11.27 

The 80th percentile is: 

A. 5 

B. 80 

C. 3 

D. 4 



(Solution on p. 94.) 



Exercise 2.11.28 



(Solution on p. 94.) 



The number that is 1.5 standard deviations BELOW the mean is approximately: 

A. 0.7 

B. 4.8 

C. -2.8 

D. Cannot be determined 
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The next two questions refer to the following histogram. Suppose one hundred eleven people who 
shopped in a special T-shirt store were asked the number of T-shirts they own costing more than $19 each. 
Relative 
Frequency 
40/111 



30/111 



20/111 



10/111 





39/111 








25/111 






23/111 








17/111 






5/111 








2/111 


1 


1 


2 


3 


4 


5 


6 1 



Number of T-shirts costing more than $19 each 

Exercise 2.11.29 (Solution on p. 94.) 

The percent of people that own at most three (3) T-shirts costing more than $19 each is approxi- 
mately: 

A. 21 

B. 59 

C. 41 

D. Cannot be determined 

Exercise 2.11.30 (Solution on p. 94.) 

If the data were collected by asking the first 111 people who entered the store, then the type of 
sampling is: 

A. cluster 

B. simple random 

C. stratified 

D. convenience 



Exercise 2.11.31 (Solution on p. 94.) 

Below are the 2010 obesity rates by U.S. states and Washington, DC. (Source: 
http://www.cdc.gov/obesity/data/adult.html)) 
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State 


Percent (%) 


State 


Percent (%) 


Alabama 


32.2 


Montana 


23.0 


Alaska 


24.5 


Nebraska 


26.9 


Arizona 


24.3 


Nevada 


22.4 


Arkansas 


30.1 


New Hampshire 


25.0 


California 


24.0 


New Jersey 


23.8 


Colorado 


21.0 


New Mexico 


25.1 


Connecticut 


22.5 


New York 


23.9 


Delaware 


28.0 


North Carolina 


27.8 


Washington, DC 


22.2 


North Dakota 


27.2 


Florida 


26.6 


Ohio 


29.2 


Georgia 


29.6 


Oklahoma 


30.4 


Hawaii 


22.7 


Oregon 


26.8 


Idaho 


26.5 


Pennsylvania 


28.6 


Illinois 


28.2 


Rhode Island 


25.5 


Indiana 


29.6 


South Carolina 


31.5 


Iowa 


28.4 


South Dakota 


27.3 


Kansas 


29.4 


Tennessee 


30.8 


Kentucky 


31.3 


Texas 


31.0 


Louisiana 


31.0 


Utah 


22.5 


Maine 


26.8 


Vermont 


23.2 


Maryland 


27.1 


Virginia 


26.0 


Massachusetts 


23.0 


Washington 


25.5 


Michigan 


30.9 


West Virginia 


32.5 


Minnesota 


24.8 


Wisconsin 


26.3 


Mississippi 


34.0 


Wyoming 


25.1 


Missouri 


30.5 







Table 2.24 



Construct a bar graph of obesity rates of your state and the four states closest to your state. 

Hint: Label the x-axis with the states. 
Use a random number generator to randomly pick 8 states. Construct a bar graph of the obesity 

rates of those 8 states. 
Construct a bar graph for all the states beginning with the letter "A." 
Construct a bar graph for all the states beginning with the letter "M." 



Exercise 2.11.32 (Solution on p. 95.) 

A music school has budgeted to purchase 3 musical instruments. They plan to purchase a piano 
costing $3000, a guitar costing $550, and a drum set costing $600. The average cost for a piano is 
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$4,000 with a standard deviation of $2,500. The average cost for a guitar is $500 with a standard 
deviation of $200. The average cost for drums is $700 with a standard deviation of $100. Which 
cost is the lowest, when compared to other instruments of the same type? Which cost is the highest 
when compared to other instruments of the same type. Justify your answer numerically. 

Exercise 2.11.33 (Solution on p. 95.) 

Suppose that a publisher conducted a survey asking adult consumers the number of fiction pa- 
perback books they had purchased in the previous month. The results are summarized in the table 
below. (Note that this is the data presented for publisher B in homework exercise 13). 

Publisher B 



# of books 


Freq. 


Rel. Freq. 





18 




1 


24 




2 


24 




3 


22 




4 


15 




5 


10 




7 


5 




9 


1 





Table 2.25 

a. Are there any outliers in the data? Use an appropriate numerical test involving the IQR to 
identify outliers, if any, and clearly state your conclusion. 

b. If a data value is identified as an outlier, what should be done about it? 

c. Are any data values further than 2 standard deviations away from the mean? In some situ- 
ations, statisticians may use this criteria to identify data values that are unusual, compared 
to the other data values. (Note that this criteria is most appropriate to use for data that is 
mound-shaped and symmetric, rather than for skewed data.) 

d. Do parts (a) and (c) of this problem give the same answer? 

e. Examine the shape of the data. Which part, (a) or (c), of this question gives a more appropri- 
ate result for this data? 

f . Based on the shape of the data which is the most appropriate measure of center for this data: 
mean, median or mode? 



^Exercises 32 and 33 contributed by Roberta Bloom 



2.12 Lab: Descriptive Statistics 14 

Class Time: 
Names: 

2.12.1 Student Learning Outcomes 

• The student will construct a histogram and a box plot. 

• The student will calculate univariate statistics. 

• The student will examine the graphs to interpret what the data implies. 

2.12.2 Collect the Data 

Record the number of pairs of shoes you own: 

1. Randomly survey 30 classmates. Record their values. 

Survey Results 



Table 2.26 
2. Construct a histogram. Make 5-6 intervals. Sketch the graph using a ruler and pencil. Scale the axes. 



4 This content is available online at <http://cnx.Org/content/ml6299/l.13/>. 
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Frequency 



Number of Pairs 
of Shoes 



Figure 2.4 



3. Calculate the following: 



4. Are the data discrete or continuous? How do you know? 

5. Describe the shape of the histogram. Use complete sentences. 

6. Are there any potential outliers? Which value(s) is (are) it (they)? Use a formula to check the end 
values to determine if they are potential outliers. 



2.12.3 Analyze the Data 

1. Determine the following: 

• Minimum value = 

• Median = 

• Maximum value = 

• First quartile = 

• Third quartile = 

• IQR = 

2. Construct a box plot of data 

3. What does the shape of the box plot imply about the concentration of data? Use complete sentences. 

4. Using the box plot, how can you determine if there are potential outliers? 

5. How does the standard deviation help you to determine concentration of the data and whether or not 
there are potential outliers? 

6. What does the IQR represent in this problem? 

7. Show your work to find the value that is 1.5 standard deviations: 

a. Above the mean: 

b. Below the mean: 
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Solutions to Exercises in Chapter 2 

Solution to Example 2.2, Problem (p. 47) 

The value 12.3 may be an outlier. Values appear to concentrate at 3 and 4 kilometers. 



Solution to Example 2.7, Problem (p. 52) 

• 3.5 to 4.5 

• 4.5 to 5.5 

• 6 

• 5.5 to 6.5 



Stem 


Leaf 


1 


15 


2 


357 


3 


23358 


4 


025578 


5 


56 


6 


57 


7 




8 




9 




10 




11 




12 


3 



Table 2.27 



Solutions to Practice 1: Center of the Data 

Solution to Exercise 2.9.1 (p. 68) 
65 

Solution to Exercise 2.9.2 (p. 68) 
1 

Solution to Exercise 2.9.5 (p. 69) 
4.75 

Solution to Exercise 2.9.6 (p. 69) 
1.39 

Solution to Exercise 2.9.7 (p. 69) 
65 

Solution to Exercise 2.9.8 (p. 69) 
4 

Solution to Exercise 2.9.9 (p. 69) 
4 

Solution to Exercise 2.9.10 (p. 69) 
4 
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Solution to Exercise 2.9.11 (p. 69) 

4 

Solution to Exercise 2.9.12 (p. 69) 

6 

Solution to Exercise 2.9.13 (p. 69) 

6-4 = 2 

Solution to Exercise 2.9.14 (p. 69) 

3 

Solution to Exercise 2.9.15 (p. 69) 

6 

Solution to Exercise 2.9.16 (p. 70) 

a. 8.93 

b. 0.58 

Solutions to Practice 2: Spread of the Data 

Solution to Exercise 2.10.1 (p. 71) 

6 

Solution to Exercise 2.10.2 (p. 71) 

a. 1447.5 

b. 528.5 

Solution to Exercise 2.10.3 (p. 71) 

474 FTES 

Solution to Exercise 2.10.4 (p. 71) 

50% 

Solution to Exercise 2.10.5 (p. 71) 

919 

Solution to Exercise 2.10.6 (p. 71) 

0.03 

Solutions to Homework 

Solution to Exercise 2.11.1 (p. 72) 

a. 1.48 

b. 1.12 

e. 1 

f. 1 
g- 2 



i. 80% 

J- 1 

k. 3 



93 



Solution to Exercise 2.11.3 (p. 72) 



a. 3.78 

b. 1.29 

e. 3 

f. 4 

g- 5 
































h. 1 

i. 32.5% 
j-4 


3 


4 


5 


7 



k. 5 

Solution to Exercise 2.11.5 (p. 74) 

b. 241 

c. 205.5 

d. 272.5 



e. 174 205.5 241 272.5 302 

f. 205.5,272.5 

g. sample 

h. population 
i. i. 236.34 

ii. 37.50 

iii. 161.34 

iv. 0.84 std. dev. below the mean 
j. Young 

Solution to Exercise 2.11.9 (p. 75) 

Kamala 

Solution to Exercise 2.11.15 (p. 80) 

a. True 

b. True 

c. True 

d. False 

Solution to Exercise 2.11.17 (p. 81) 

b. 4,3,5 

c. 4 

d. 3 
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e. 2 3 4 5 9 

f. 3,5 

g. 3.94 
h. 1.28 
i. 3 

j. mode 

Solution to Exercise 2.11.19 (p. 82) 

c. Maybe 

Solution to Exercise 2.11.21 (p. 83) 

a. more children 

b. 62.4% 

Solution to Exercise 2.11.23 (p. 84) 

b. 51,99 

Solution to Exercise 2.11.24 (p. 84) 

A 

Solution to Exercise 2.11.25 (p. 85) 

A 

Solution to Exercise 2.11.26 (p. 85) 

B 

Solution to Exercise 2.11.27 (p. 85) 

D 

Solution to Exercise 2.11.28 (p. 85) 

A 

Solution to Exercise 2.11.29 (p. 86) 

C 

Solution to Exercise 2.11.30 (p. 86) 

D 

Solution to Exercise 2.11.31 (p. 86) 

Example solution for b using the random number generator for the Ti-84 Plus to generate a simple random 

sample of 8 states. Instructions are below. 

Number the entries in the table 1-51 (Includes Washington, DC; Numbered vertically) 
Press MATH 
Arrow over to PRB 
Press 5:randlnt( 
Enter 51,1,8) 

Eight numbers are generated (use the right arrow key to scroll through the numbers). The 
numbers correspond to the numbered states (for this example: {47 21 9 23 51 13 25 4}. If 
any numbers are repeated, generate a different number by using 5:randlnt(51,l)). Here, the 
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states (and Washington DC) are {Arkansas, Washington DC, Idaho, Maryland, Michigan, Missis- 
sippi, Virginia, Wyoming}. Corresponding percents are {28.7 21.8 24.5 26 28.9 32.8 25 24.6}. 



40 
35 
30 
25 
Percent [X] 20 
15 
10 




Arkansas Wash DC Ida ho Maryland Michigan Mississippi Virginia Wyoming 



Solution to Exercise 2.11.32 (p. 87) 

For pianos, the cost of the piano is 0.4 standard deviations BELOW average. For guitars, the cost of the 
guitar is 0.25 standard deviations ABOVE average. For drums, the cost of the drum set is 1.0 standard 
deviations BELOW average. Of the three, the drums cost the lowest in comparison to the cost of other 
instruments of the same type. The guitar cost the most in comparison to the cost of other instruments of the 
same type. 
Solution to Exercise 2.11.33 (p. 88) 

• IQR = 4 - 1 = 3 ; Ql - 1.5*IQR = 1 - 1.5(3) = -3.5 ; Q3 + 1.5*IQR = 4 + 1.5(3) = 8.5 ;The data value of 9 is 
larger than 8.5. The purchase of 9 books in one month is an outlier. 

• The outlier should be investigated to see if there is an error or some other problem in the data; then a 
decision whether to include or exclude it should be made based on the particular situation. If it was 
a correct value then the data value should remain in the data set. If there is a problem with this data 
value, then it should be corrected or removed from the data. For example: If the data was recorded 
incorrectly (perhaps a 9 was miscoded and the correct value was 6) then the data should be corrected. 
If it was an error but the correct value is not known it should be removed from the data set. 

• xbar - 2s = 2.45 - 2*1.88 = -1.31 ; xbar + 2s = 2.45 + 2*1.88 = 6.21 ; Using this method, the five data values 
of 7 books purchased and the one data value of 9 books purchased would be considered unusual. 

• No: part (a) identifies only the value of 9 to be an outlier but part (c) identifies both 7 and 9. 

• The data is skewed (to the right). It would be more appropriate to use the method involving the IQR 
in part (a), identifying only the one value of 9 books purchased as an outlier. Note that part (c) remarks 
that identifying unusual data values by using the criteria of being further than 2 standard deviations 
away from the mean is most appropriate when the data are mound-shaped and symmetric. 

• The data are skewed to the right. For skewed data it is more appropriate to use the median as a 
measure of center. 
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Chapter 3 

Linear Regression and Correlation 

3.1 Linear Regression and Correlation 1 

3.1.1 Student Learning Objectives 

By the end of this chapter, the student should be able to: 

• Discuss basic ideas of linear regression and correlation. 

• Create and interpret a line of best fit. 

• Calculate and interpret the correlation coefficient. 

• Calculate and interpret outliers. 



3.1.2 Introduction 

Professionals often want to know how two or more variables are related. For example, is there a relationship 
between the grade on the second math exam a student takes and the grade on the final exam? If there is a 
relationship, what is it and how strong is the relationship? 

In another example, your income may be determined by your education, your profession, your years of 
experience, and your ability. The amount you pay a repair person for labor is often determined by an initial 
amount plus an hourly fee. These are all examples in which regression can be used. 

The type of data described in the examples is bivariate data - "bi" for two variables. In reality, statisticians 
use multivariate data, meaning many variables. 

In this chapter, you will be studying the simplest form of regression, "linear regression" with one indepen- 
dent variable (x). This involves data that fits a line in two dimensions. You will also study correlation which 
measures how strong the relationship is. 

3.2 Linear Equations 2 

Linear regression for two variables is based on a linear equation with one independent variable. It has the 
form: 

y = a + bx (3.1) 



lr rhis content is available online at <http://cnx.Org/content/ml7089/l.5/>. 
2 This content is available online at <http://cnx.Org/content/ml7086/l.4/>. 
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where a and b are constant numbers. 

x is the independent variable, and y is the dependent variable. Typically, you choose a value to substitute 
for the independent variable and then solve for the dependent variable. 



Example 3.1 

The following examples are linear equations. 



y = 3 + 2x 



(3.2) 



y = -0.01 + 1.2x 



(3.3) 



The graph of a linear equation of the form y = a + bx is a straight line. Any line that is not vertical can be 
described by this equation. 

Example 3.2 




Figure 3.1: Graph of the equation y = — 1 + 2x. 



Linear equations of this form occur in applications of life sciences, social sciences, psychology, business, 
economics, physical sciences, mathematics, and other areas. 

Example 3.3 

Aaron's Word Processing Service (AWPS) does word processing. Its rate is $32 per hour plus a 
$31.50 one-time charge. The total cost to a customer depends on the number of hours it takes to 
do the word processing job. 

Problem 

Find the equation that expresses the total cost in terms of the number of hours required to finish 
the word processing job. 

Solution 

Let x = the number of hours it takes to get the job done. 

Let y = the total cost to the customer. 

The $31.50 is a fixed cost. If it takes x hours to complete the job, then (32) (x) is the cost of the 
word processing only. The total cost is: 
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y = 31.50 + 32x 



3.3 Slope and Y-Intercept of a Linear Equation 3 

For the linear equation y = a + frx, fc = slope and a = y-intercept. 

From algebra recall that the slope is a number that describes the steepness of a line and the y-intercept is 
the y coordinate of the point (0, a) where the line crosses the y-axis. 





(a) 



(b) 



(c) 



Figure 3.2: Three possible graphs of y = a + bx. (a) If & > 0, the line slopes upward to the right, (b) If 
b = 0, the line is horizontal, (c) lib < 0, the line slopes downward to the right. 



Example 3.4 

Svetlana tutors to make extra money for college. For each tutoring session, she charges a one 
time fee of $25 plus $15 per hour of tutoring. A linear equation that expresses the total amount of 
money Svetlana earns for each session she tutors is y = 25 + 15x. 

Problem 

What are the independent and dependent variables? What is the y-intercept and what is the 
slope? Interpret them using complete sentences. 

Solution 

The independent variable (x) is the number of hours Svetlana tutors each session. The dependent 
variable (y) is the amount, in dollars, Svetlana earns for each session. 

The y-intercept is 25 (a = 25). At the start of the tutoring session, Svetlana charges a one-time fee 
of $25 (this is when x = 0). The slope is 15 (b = 15). For each session, Svetlana earns $15 for each 
hour she tutors. 



3 This content is available online at <http://cnx.org/content/ml7083/1.5/>. 
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3.4 Scatter Plots 4 

Before we take up the discussion of linear regression and correlation, we need to examine a way to display 
the relation between two variables x and y. The most common and easiest way is a scatter plot. The 
following example illustrates a scatter plot. 

Example 3.5 

From an article in the Wall Street Journal : In Europe and Asia, m-commerce is becoming more 
popular. M-commerce users have special mobile phones that work like electronic wallets as well as 
provide phone and Internet services. Users can do everything from paying for parking to buying 
a TV set or soda from a machine to banking to checking sports scores on the Internet. In the next 
few years, will there be a relationship between the year and the number of m-commerce users? 
Construct a scatter plot. Let x = the year and let y = the number of m-commerce users, in millions. 



x (year) 


y (# of users) 


2000 


0.5 


2002 


20.0 


2003 


33.0 


2004 


47.0 




(a) 



Figure 3.3: (a) Table showing the number of m-commerce users (in millions) by year, (b) Scatter plot 
showing the number of m-commerce users (in millions) by year. 



A scatter plot shows the direction and strength of a relationship between the variables. A clear direction 
happens when there is either: 

• High values of one variable occurring with high values of the other variable or low values of one 
variable occurring with low values of the other variable. 

• High values of one variable occurring with low values of the other variable. 

You can determine the strength of the relationship by looking at the scatter plot and seeing how close the 
points are to a line, a power function, an exponential function, or to some other type of function. 

When you look at a scatterplot, you want to notice the overall pattern and any deviations from the pattern. 
The following scatterplot examples illustrate these concepts. 



4 This content is available online at <http://cnx.Org/content/ml7082/l.6/>. 
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(a) Positive Linear Pattern (Strong) (b) Linear Pattern w/ One Deviation 

Figure 3.4 





(a) Negative Linear Pattern (Strong) (b) Negative Linear Pattern (Weak) 

Figure 3.5 
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(a) Exponential Growth Pattern 



(b) No Pattern 



Figure 3.6 



In this chapter, we are interested in scatter plots that show a linear pattern. Linear patterns are quite com- 
mon. The linear relationship is strong if the points are close to a straight line. If we think that the points 
show a linear relationship, we would like to draw a line on the scatter plot. This line can be calculated 
through a process called linear regression. However, we only calculate a regression line if one of the vari- 
ables helps to explain or predict the other variable. If x is the independent variable and y the dependent 
variable, then we can use a regression line to predict y for a given value of x. 
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3.5 The Regression Equation 5 

Data rarely fit a straight line exactly. Usually, you must be satisfied with rough predictions. Typically, you 
have a set of data whose scatter plot appears to "fit" a straight line. This is called a Line of Best Fit or Least 
Squares Line. 

3.5.1 Optional Collaborative Classroom Activity 

If you know a person's pinky (smallest) finger length, do you think you could predict that person's height? 
Collect data from your class (pinky finger length, in inches). The independent variable, x, is pinky finger 
length and the dependent variable, y, is height. 

For each set of data, plot the points on graph paper. Make your graph big enough and use a ruler. Then 
"by eye" draw a line that appears to "fit" the data. For your line, pick two convenient points and use them 
to find the slope of the line. Find the y-intercept of the line by extending your lines so they cross the y-axis. 
Using the slopes and the y-intercepts, write your equation of "best fit". Do you think everyone will have 
the same equation? Why or why not? 

Using your equation, what is the predicted height for a pinky length of 2.5 inches? 
Example 3.6 

A random sample of 1 1 statistics students produced the following data where x is the third exam 
score, out of 80, and y is the final exam score, out of 200. Can you predict the final exam score of a 
random student if you know the third exam score? 



5 This content is available online at <http://cnx.Org/content/ml7090/l.14/>. 
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x (third exam score) 


y (final exam score) 


65 


175 


67 


133 


71 


185 


71 


163 


66 


126 


75 


198 


67 


153 


70 


163 


71 


159 


69 


151 


69 


159 



(a) 



250 








Exam Score 

8 g 8 








1 50 

IL 




















i i i 




60 


65 70 75 
Third Exam Score 


80 



(b) 

Figure 3.7: (a) Table showing the scores on the final exam based on scores from the third exam, (b) Scatter 
plot showing the scores on the final exam based on scores from the third exam. 



The third exam score, x, is the independent variable and the final exam score, y, is the dependent variable. 
We will plot a regression line that best "fits" the data. If each of you were to fit a line "by eye", you would 
draw different lines. We can use what is called a least-squares regression line to obtain the best fit line. 

Consider the following diagram. Each point of data is of the the form (x, y)and each point of the line of 

( A \ 
best fit using least-squares linear regression has the form \x, y \. 



The y is read "y hat" and is the estimated value of y. It is the value of y obtained using the regression line. 
It is not generally equal to y from data. 
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data point = (x„, y„) 



distance = |y„ - yj = e„ 



point on line = (x , y^ 




Figure 3.8 



The term |t/q — 2/q I = e is called the "error" or residual. It is not an error in the sense of a mistake, but 
measures the vertical distance between the actual value of y and the estimated value of y. In other words, 
it measures the vertical distance between the actual data point and the predicted point on the line. 

If the observed data point lies above the line, the residual is positive, and the line underestimates the 
actual data value for y. If the observed data point lies below the line, the residual is negative, and the line 
overestimates that actual data value for y. 

A 

In the diagram above, yo — Vq = £o is the residual for the point shown. Here the point lies above the line 
and the residual is positive. 



e = the Greek letter epsilon 

A 

For each data point, you can calculate the residuals or errors, |y, — y, 
Each e is a vertical distance. 



e,forz = 1,2,3, ...,11. 



For the example about the third exam scores and the final exam scores for the 11 statistics students, there 
are 11 data points. Therefore, there are lie values. If you square each e and add, you get 



(e l ) 2 + (e 2 ) 2 + ... + (e n ) 2 



n 
= E e 2 

i = l 



This is called the Sum of Squared Errors (SSE). 

Using calculus, you can determine the values of a and b that make the SSE a minimum. When you make 
the SSE a minimum, you have determined the points that are on the line of best fit. It turns out that the line 
of best fit has the equation: 



y=a + bx 



(3.4) 



where a = y — b ■ x and b = — x ' ' v 7 y ' . 

3 -L(x-x) 1 
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x and y are the averages of the x values and the y values, respectively. The best fit line always passes 
through the point (x, y ) . 



The slope b can be written as b — r 



where Sy = the standard deviation of the y values and s x = the 



standard deviation of the x values, r is the correlation coefficient which is discussed in the next section. 

Least Squares Criteria for Best Fit 

The process of fitting the best fit line is called linear regression. The idea behind finding the best fit line is 
based on the assumption that the data are scattered about a straight line. The criteria for the best fit line is 
that the sum of the squared errors (SSE) is minimized, that is made as small as possible. Any other line you 
might choose would have a higher SSE than the best fit line. This best fit line is called the least squares 
regression line . 

NOTE: Computer spreadsheets, statistical software, and many calculators can quickly calculate the 
best fit line and create the graphs. The calculations tend to be tedious if done by hand. Instructions 
to use the TT83, TI-83+, and TI-84+ calculators to find the best fit line and create a scatterplot are 
shown at the end of this section. 

THIRD EXAM vs FINAL EXAM EXAMPLE: 

The graph of the line of best fit for the third exam/final exam example is shown below: 





250 


1 


200 
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E 

J! 
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= 


50 



H h 
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i 



64 



69 
Third Exam Score 



74 



Figure 3.9 



The least squares regression line (best fit line) for the third exam/final exam example has the equation: 



y= -173.51 +4.83x 



(3.5) 



NOTE: 
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Remember, it is always important to plot a scatter diagram first. If the scatter plot indicates that 
there is a linear relationship between the variables, then it is reasonable to use a best fit line 
to make predictions for y given x within the domain of x-values in the sample data, but not 
necessarily for x-values outside that domain. 

You could use the line to predict the final exam score for a student who earned a grade of 73 on 
the third exam. 

You should NOT use the line to predict the final exam score for a student who earned a grade of 
50 on the third exam, because 50 is not within the domain of the x-values in the sample data, 
which are between 65 and 75. 

UNDERSTANDING SLOPE 

The slope of the line, b, describes how changes in the variables are related. It is important to interpret 
the slope of the line in the context of the situation represented by the data. You should be able to write a 
sentence interpreting the slope in plain English. 

INTERPRETATION OF THE SLOPE: The slope of the best fit line tells us how the dependent variable (y) 
changes for every one unit increase in the independent (x) variable, on average. 

THIRD EXAM vs FINAL EXAM EXAMPLE 

Slope: The slope of the line is b = 4.83. 

Interpretation: For a one point increase in the score on the third exam, the final exam score increases by 
4.83 points, on average. 

3.5.2 Using the TI-83+ and TI-84+ Calculators 

Using the Linear Regression T Test: LinRegTTest 

Step 1. In the STAT list editor, enter the X data in list LI and the Y data in list L2, paired so that the corre- 
sponding (x,y) values are next to each other in the lists. (If a particular pair of values is repeated, enter 
it as many times as it appears in the data.) 

Step 2. On the STAT TESTS menu, scroll down with the cursor to select the LinRegTTest. (Be careful to select 
LinRegTTest as some calculators may also have a different item called LinRegTInt.) 

Step 3. On the LinRegTTest input screen enter: Xlist: LI ; Ylist: L2 ; Freq: 1 

Step 4. On the next line, at the prompt /3 or p, highlight "7^ 0" and press ENTER 

Step 5. Leave the line for "RegEq:" blank 

Step 6. Highlight Calculate and press ENTER. 
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LinRegTTest Input Screen and Output Screen 



LinRegTTest 
Xlist: L1 
Ylist: L2 
Freq: 1 
p orp 

RegEQ: 
Calculate 



?*0 <0 >0 



Tl^83+ and TI-84+ 
calculators 



LinRegTTest 
y = a + bx 
/J^Oand/^0 
t = 2.657560155 
p = . 0261501512 
df = 9 
4,a = -173.513363 
b = 4.827394209 
s= 16.41237711 
r 2 = .4396931 104 
r=. 663093591 



Figure 3.10 



The output screen contains a lot of information. For now we will focus on a few items from the output, and 
will return later to the other items. 

The second line says y=a+bx. Scroll down to find the values a=-173.513, and b=4.8273 ; the equation of the 

A 

best fit line is y= -173.51 + 4.83x 
The two items at the bottom are r 2 = .43969 and r=.663. For now, just note where to find these values; we 
will discuss them in the next two sections. 

Graphing the Scatterplot and Regression Line 

Step 1. We are assuming your X data is already entered in list LI and your Y data is in list L2 

Step 2. Press 2nd STATPLOT ENTER to use Plot 1 

Step 3. On the input screen for PLOT 1, highlight On and press ENTER 

Step 4. For TYPE: highlight the very first icon which is the scatterplot and press ENTER 

Step 5. Indicate Xlist: LI and Ylist: L2 

Step 6. For Mark: it does not matter which symbol you highlight. 

Step 7. Press the ZOOM key and then the number 9 (for menu item "ZoomStat") ; the calculator will fit the 

window to the data 
Step 8. To graph the best fit line, press the "Y=" key and type the equation -173.5+4.83X into equation Yl. 

(The X key is immediately left of the STAT key). Press ZOOM 9 again to graph it. 
Step 9. Optional: If you want to change the viewing window, press the WINDOW key. Enter your desired 

window using Xmin, Xmax, Ymin, Ymax 



**With contributions from Roberta Bloom 
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3.6 Correlation Coefficient and Coefficient of Determination 6 

3.6.1 The Correlation Coefficient r 

Besides looking at the scatter plot and seeing that a line seems reasonable, how can you tell if the line is a 
good predictor? Use the correlation coefficient as another indicator (besides the scatterplot) of the strength 
of the relationship between x and y. 

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is a numerical measure of the 
strength of association between the independent variable x and the dependent variable y 

The correlation coefficient is calculated as 

n-liX-y — CLx) ■ (Ly) 



\n ■ Ex 2 - (Ex) 2 j • \n ■ £y 2 - (Ey) : 



where n = the number of data points. 



If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship 
is. 

What the VALUE of r tells us: 

• The value of r is always between -1 and +1:— 1 < r < 1. 

• The closer the correlation coefficient r is to -1 or 1 (and the further from 0), the stronger the evidence 
of a significant linear relationship between x and y; this would indicate that the observed data points 
fit more closely to the best fit line. Values of r further from indicate a stronger linear relationship 
between x and y. Values of r closer to indicate a weaker linear relationship between x and y. 

• If r = there is absolutely no linear relationship between x and y (no linear correlation). 

• If r = 1, there is perfect positive correlation. If r = —1, there is perfect negative correlation. In both 
these cases, all of the original data points lie on a straight line. Of course, in the real world, this will 
not generally happen. 

What the SIGN of r tells us 

• A positive value of r means that when x increases, y increases and when x decreases, y decreases 
(positive correlation). 

• A negative value of r means that when x increases, y decreases and when x decreases, y increases 
(negative correlation). 

• The sign of r is the same as the sign of the slope, b, of the best fit line. 

NOTE: Strong correlation does not suggest that x causes y or y causes x. We say "correlation does 
not imply causation." For example, every person who learned math in the 17th century is dead. 
However, learning math does not necessarily cause death! 



6 This content is available online at <http://cnx.Org/content/ml7092/l.ll/>. 



109 




O 
/ O O 

o X O 

o y 
o 



(a) Positive Correlation (b) Negative Correlation (c) Zero Correlation 

Figure 3.11: (a) A scatter plot showing data with a positive correlation. < r < 1 (b) A scatter plot 
showing data with a negative correlation. — 1 < r < (c) A scatter plot showing data with zero correlation. 
r=0 



The formula for r looks formidable. However, computer spreadsheets, statistical software, and many cal- 
culators can quickly calculate r. The correlation coefficient r is the bottom item in the output screens for the 
LinRegTTest on the TI-83, TI-83+, or TI-84+ calculator (see previous section for instructions). 

3.6.2 The Coefficient of Determination 

r 2 is called the coefficient of determination, r 2 is the square of the correlation coefficient , but is usually 
stated as a percent, rather than in decimal form, r 2 has an interpretation in the context of the data 

• r 2 , when expressed as a percent, represents the percent of variation in the dependent variable y that 
can be explained by variation in the independent variable x using the regression (best fit) line. 

• 1-r , when expressed as a percent, represents the percent of variation in y that is NOT explained by 
variation in x using the regression line. This can be seen as the scattering of the observed data points 
about the regression line. 

Consider the third exam/final exam example introduced in the previous section 

A 

The line of best fit is: V- -173.51 + 4.83x 

The correlation coefficient is r = 0.6631 

The coefficient of determination is r = 0.6631 2 = 0.4397 
Interpretation of r 2 in the context of this example: 

Approximately 44% of the variation in the final exam grades can be explained by the variation in the 
grades on the third exam, using the best fit regression line. 

Therefore approximately 56% of the variation in the final exam grades can NOT be explained by the vari- 
ation in the grades on the third exam, using the best fit regression line. (This is seen as the scattering 
of the points about the line.) 

**With contributions from Roberta Bloom. 
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3.7 Testing the Significance of the Correlation Coefficient 7 

3.7.1 Testing the Significance of the Correlation Coefficient 

The correlation coefficient, r, tells us about the strength of the linear relationship between x and y. However, 
the reliability of the linear model also depends on how many observed data points are in the sample. We 
need to look at both the value of the correlation coefficient r and the sample size n, together. 

We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the 
linear relationship in the sample data is strong enough and reliable enough to use to model the relationship 
in the population. 

The sample data is used to compute r, the correlation coefficient for the sample. If we had data for the entire 
population, we could find the population correlation coefficient. But because we only have sample data, we 
can not calculate the population correlation coefficient. The sample correlation coefficient, r, is our estimate 
of the unknown population correlation coefficient. 

The symbol for the population correlation coefficient is p, the Greek letter "rho". 

p = population correlation coefficient (unknown) 

r = sample correlation coefficient (known; calculated from sample data) 

The hypothesis test lets us decide whether the value of the population correlation coefficient p is "close to 
0" or "significantly different from 0". We decide this based on the sample correlation coefficient r and the 
sample size n. 

If the test concludes that the correlation coefficient is significantly different from 0, we say that the 
correlation coefficient is "significant". 

• Conclusion: "The correlation coefficient IS SIGNIFICANT" 

• What the conclusion means: We believe that there is a significant linear relationship between x and y. 
We can use the regression line to model the linear relationship between x and y in the population. 

If the test concludes that the correlation coefficient is not significantly different from (it is close to 0), 
we say that correlation coefficient is "not significant". 

• Conclusion: "The correlation coefficient IS NOT SIGNIFICANT." 

• What the conclusion means: We do NOT believe that there is a significant linear relationship between 
x and y Therefore we can NOT use the regression line to model a linear relationship between x and y 
in the population. 

NOTE: 

• If r is significant and the scatter plot shows a reasonable linear trend, the line can be used to 
predict the value of y for values of x that are within the domain of observed x values. 

• If r is not significant OR if the scatter plot does not show a reasonable linear trend, the line 
should not be used for prediction. 

• If r is significant and if the scatter plot shows a reasonable linear trend, the line may NOT be 
appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data. 

PERFORMING THE HYPOTHESIS TEST 
SETTING UP THE HYPOTHESES: 



Null Hypothesis: Ho: p=0 



7 This content is available online at <http://cnx.Org/content/ml7077/l.14/>. 
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• Alternate Hypothesis: Ha: p^O 

What the hypotheses mean in words: 

• Null Hypothesis Ho: The population correlation coefficient IS NOT significantly different from 0. 
There IS NOT a significant linear relationship(correlation) between x and y in the population. 

• Alternate Hypothesis Ha: The population correlation coefficient IS significantly DIFFERENT FROM 
0. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the popula- 
tion. 

DRAWING A CONCLUSION: 

There are two methods to make the decision. Both methods are equivalent and give the same result. 

Method 1: Using the p-value 

Method 2: Using a table of critical values 

In this chapter of this textbook, we will always use a significance level of 5%, a = 0.05 
Note: Using the p-value method, you could choose any appropriate significance level you want; you are 
not limited to using a. = 0.05. But the table of critical values provided in this textbook assumes that 
we are using a significance level of 5%, a. = 0.05. (If we wanted to use a different significance level 
than 5% with the critical value method, we would need different tables of critical values that are not 
provided in this textbook.) 

METHOD 1: Using a p-value to make a decision 

The linear regression t-test LinRegTTEST on the TI-83+ or TI-84+ calculators calculates the p-value. 
On the LinRegTTEST input screen, on the line prompt for f> or p, highlight "7^ 0" 
The output screen shows the p-value on the line that reads "p=". 
(Most computer statistical software can calculate the p-value.) 

If the p-value is less than the significance level (a = 0.05): 

• Decision: REJECT the null hypothesis. 

• Conclusion: "The correlation coefficient IS SIGNIFICANT." 

• We believe that there IS a significant linear relationship between x and y because the correlation 
coefficient is significantly different from 0. 

If the p-value is NOT less than the significance level (a = 0.05) 

• Decision: DO NOT REJECT the null hypothesis. 

• Conclusion: "The correlation coefficient is NOT significant." 

• We believe that there is NOT a significant linear relationship between x and y because the correlation 
coefficient is NOT significantly different from 0. 

Calculation Notes: 

You will use technology to calculate the p-value. The following describe the calculations to compute the 

test statistics and the p-value: 
The p-value is calculated using a £ -distribution with n-2 degrees of freedom. 

The formula for the test statistic is t = ' r )— ^ • The value of the test statistic, t , is shown in the computer 
or calculator output along with the p-value. The test statistic t has the same sign as the correlation 
coefficient r. 

The p-value is the probability (area) in both tails further out beyond the values -t and t . 

For the TI-83+ and TI-84+ calculators, the command 2*tcdf(abs(t),10 A 99, n-2) computes the p-value given 
by the LinRegTTest; abs(t) denotes absolute value: 1 1 1 
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THIRD EXAM vs FINAL EXAM EXAMPLE: p value method 

• Consider the third exam/final exam example. 

A 

• The line of best fit is: V— —173.51 + 4.83x with r — 0.6631 and there are n = 11 data points. 

• Can the regression line be used for prediction? Given a third exam score (x value), can we use the 
line to predict the final exam score (predicted y value)? 

Ho: p = 
Ha: p £ 
a = 0.05 

The p-value is 0.026 (from LinRegTTest on your calculator or from computer software) 
The p-value, 0.026, is less than the significance level of a. = 0.05 
Decision: Reject the Null Hypothesis Ho 
Conclusion: The correlation coefficient IS SIGNIFICANT. 

Because r is significant and the scatter plot shows a reasonable linear trend, the regression line can be 
used to predict final exam scores. 

METHOD 2: Using a table of Critical Values to make a decision 

The 95% Critical Values of the Sample Correlation Coefficient Table (Section 3.10) at the end of this 
chapter (before the Summary (Section 3.11)) may be used to give you a good idea of whether the computed 
value of r is significant or not. Compare r to the appropriate critical value in the table. If r is not between 
the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then 
you may want to use the line for prediction. 

Example 3.7 

Suppose you computed r = 0.801 using n — 10 data points, df = n — 2 — 10 — 2 = 8. The 
critical values associated with df = 8 are -0.632 and + 0.632. If r< negative critical value or r > 
positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and 
the line may be used for prediction. If you view this example on a number line, it will help you. 

[ ] 



-1 -0.632 +0.632 +0.801 +1 



Figure 3.12: r is not significant between -0.632 and +0.632. r = 0.801 > + 0.632. Therefore, r is significant. 



Example 3.8 

Suppose you computed r = —0.624 with 14 data points, df = 14 — 2 = 12. The critical values are 
-0.532 and 0.532. Since — 0.624<— 0.532, r is significant and the line may be used for prediction 



-0.624 -0.532 +0.532 



Figure 3.13: r = — 0.624< — 0.532. Therefore, r is significant. 
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Example 3.9 

Suppose you computed r — 0.776 and n = 6. df = 6 — 2 — 4. The critical values are -0.811 
and 0.811. Since — 0.811 < 0.776 < 0.811, r is not significant and the line should not be used for 
prediction. 



-0.811 0.776 0.811 



Figure 3.14: -0.811 <r = 0.776<0.811. Therefore, r is not significant. 



THIRD EXAM vs FINAL EXAM EXAMPLE: critical value method 

• Consider the third exam/final exam example. 



A 



• The line of best fit is: J/= —173.51 + 4.83x with r — 0.6631 and there are n = 11 data points. 

• Can the regression line be used for prediction? Given a third exam score (x value), can we use the 
line to predict the final exam score (predicted y value)? 

Ho: p = 
Ha: p £ 
oc = 0.05 

Use the "95% Critical Value" table for r with df = n-2 = ll-2 = 9 
The critical values are -0.602 and +0.602 
Since 0.6631 > 0.602, r is significant. 
Decision: Reject Ho 

Conclusion: The correlation coefficient is significant 

Because r is significant and the scatter plot shows a reasonable linear trend, the regression line can be 
used to predict final exam scores. 

Example 3.10: Additional Practice Examples using Critical Values 

Suppose you computed the following correlation coefficients. Using the table at the end of the 
chapter, determine if r is significant and the line of best fit associated with each r can be used to 
predict a y value. If it helps, draw a number line. 

1. r = —0.567 and the sample size, n, is 19. The df = n — 2 = 17. The critical value is -0.456. 
— 0.567<— 0.456 so r is significant. 

2. r = 0.708 and the sample size, n, is 9. The df = n — 2 — 7. The critical value is 0.666. 
0.708 > 0.666 so r is significant. 

3. r = 0.134 and the sample size, n, is 14. The df = 14 — 2 = 12. The critical value is 0.532. 
0.134 is between -0.532 and 0.532 so r is not significant. 

4. r — and the sample size, n, is 5. No matter what the dfs are, r = is between the two 
critical values so r is not significant. 
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3.7.2 Assumptions in Testing the Significance of the Correlation Coefficient 

Testing the significance of the correlation coefficient requires that certain assumptions about the data are 
satisfied. The premise of this test is that the data are a sample of observed points taken from a larger 
population. We have not examined the entire population because it is not possible or feasible to do so. We 
are examining the sample to draw a conclusion about whether the linear relationship that we see between 
x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear 
relationship between x and y in the population. 

The regression line equation that we calculate from the sample data gives the best fit line for our particular 
sample. We want to use this best fit line for the sample as an estimate of the best fit line for the population. 
Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it 
is appropriate to do this. 

The assumptions underlying the test of significance are: 

• There is a linear relationship in the population that models the average value of y for varying values 
of x. In other words, the average of the y values for each particular x value lie on a straight line 
in the population. (We do not know the equation for the line for the population. Our regression line 
from the sample is our best estimate of this line in the population.) 

• The y values for any particular x value are normally distributed about the line. This implies that 
there are more y values scattered closer to the line than are scattered farther away. Assumption (1) 
above implies that these normal distributions are centered on the line: the means of these normal 
distributions of y values lie on the line. 

• The standard deviations of the population y values about the line the equal for each value of x. In 
other words, each of these normal distributions of y values has the same shape and spread about the 
line. 





Figure 3.15: The y values for each x value are normally distributed about the line with the same standard 
deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the 
line than are scattered further away from the line. 



**With contributions from Roberta Bloom 
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3.8 Prediction 8 

Recall the third exam /final exam example. 

We examined the scatterplot and showed that the correlation coefficient is significant. We found the equa- 
tion of the best fit line for the final exam grade as a function of the grade on the third exam. We can now 
use the least squares regression line for prediction. 

Suppose you want to estimate, or predict, the final exam score of statistics students who received 73 on the 
third exam. The exam scores (x-values) range from 65 to 75. Since 73 is between the x-values 65 and 75, 
substitute x = 73 into the equation. Then: 

A 

V= -173.51 + 4.83 (73) = 179.08 (3.8) 

We predict that statistic students who earn a grade of 73 on the third exam will earn a grade of 179.08 on 
the final exam, on average. 

Example 3.11 

Recall the third exam /final exam example. 

Problem 1 

What would you predict the final exam score to be for a student who scored a 66 on the third 
exam? 

Solution 

145.27 

Problem 2 (Solution on p. 152.) 

What would you predict the final exam score to be for a student who scored a 78 on the third 
exam? 

**With contributions from Roberta Bloom 

3.9 Outliers 9 

In some data sets, there are values (observed data points) called outliers. Outliers are observed data 
points that are far from the least squares line. They have large "errors", where the "error" or residual is the 
vertical distance from the line to the point. 

Outliers need to be examined closely. Sometimes, for some reason or another, they should not be included 
in the analysis of the data. It is possible that an outlier is a result of erroneous data. Other times, an outlier 
may hold valuable information about the population under study and should remain included in the data. 
The key is to carefully examine what causes a data point to be an outlier. 

Besides outliers, a sample may contain one or a few points that are called influential points. Influential 
points are observed data points that are far from the other observed data points but that greatly influence 
the line. As a result an influential point may be close to the line, even though it is far from the rest of the 
data. Because an influential point so strongly influences the best fit line, it generally will not have a large 
"error" or residual. 

8 This content is available online at <http://cnx.Org/content/ml7095/l.7/>. 
9 This content is available online at <http://cnx.Org/content/ml7094/l.13/>. 
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Computers and many calculators can be used to identify outliers from the data. Computer output for 
regression analysis will often identify both outliers and influential points so that you can examine them. 

Identifying Outliers 

We could guess at outliers by looking at a graph of the scatterplot and best fit line. However we would like 
some guideline as to how far away a point needs to be in order to be considered an outlier. As a rough rule 
of thumb, we can flag any point that is located further than two standard deviations above or below the 
best fit line as an outlier. The standard deviation used is the standard deviation of the residuals or errors. 

We can do this visually in the scatterplot by drawing an extra pair of lines that are two standard deviations 
above and below the best fit line. Any data points that are outside this extra pair of lines are flagged as 
potential outliers. Or we can do this numerically by calculating each residual and comparing it to twice the 
standard deviation. On the TI-83, 83+, or 84+, the graphical approach is easier. The graphical procedure 
is shown first, followed by the numerical calculations. You would generally only need to use one of these 
methods. 

Example 3.12 

In the third exam/final exam example, you can determine if there is an outlier or not. If there is 
an outlier, as an exercise, delete it and fit the remaining data to a new line. For this example, the 
new line ought to fit the remaining data better. This means the SSE should be smaller and the 
correlation coefficient ought to be closer to 1 or -1 . 

Solution 

Graphical Identification of Outliers 

With the TI-83,83+,84+ graphing calculators, it is easy to identify the outlier graphically and visu- 
ally. If we were to measure the vertical distance from any data point to the corresponding point 
on the line of best fit and that distance was equal to 2s or farther, then we would consider the data 
point to be "too far" from the line of best fit. We need to find and graph the lines that are two 
standard deviations below and above the regression line. Any points that are outside these two 
lines are outliers. We will call these lines Y2 and Y3: 

As we did with the equation of the regression line and the correlation coefficient, we will use 
technology to calculate this standard deviation for us. Using the LinRegTTest with this data, 
scroll down through the output screens to find s=16.412 

Line Y2=-173.5+4.83x-2(16.4) and line Y3=-173.5+4.83X+2(16.4) 

A 

where J/=-173.5+4.83x is the line of best fit. Y2 and Y3 have the same slope as the line of 
best fit. 

Graph the scatterplot with the best fit line in equation Yl, then enter the two extra lines as Y2 and 
Y3 in the "Y="equation editor and press ZOOM 9. You will find that the only data point that is not 
between lines Y2 and Y3 is the point x=65, y=175. On the calculator screen it is just barely outside 
these lines. The outlier is the student who had a grade of 65 on the third exam and 175 on the final 
exam; this point is further than 2 standard deviations away from the best fit line. 

Sometimes a point is so close to the lines used to flag outliers on the graph that it is difficult to 
tell if the point is between or outside the lines. On a computer, enlarging the graph may help; on 
a small calculator screen, zooming in may make the graph more clear. Note that when the graph 
does not give a clear enough picture, you can use the numerical comparisons to identify outliers. 
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Figure 3.16 



Numerical Identification of Outliers 

In the table below, the first two columns are the third exam and final exam data. The third 

A A 

column shows the predicted y values calculated from the line of best fit: J/=-173.5+4.83x. The 
residuals, or errors, have been calculated in the fourth column of the table: observed y value — 



predicted y value = y- 



A 

y. 



A 

s is the standard deviation of all the y— J/= e values where n = the total number of data points. If 
each residual is calculated and squared, and the results are added up, we get the SSE. The standard 
deviation of the residuals is calculated from the SSE as: 



Rather than calculate the value of s ourselves, we can find s using the computer or calculator. For 
this example, our calculator LinRegTTest found s=16.4 as the standard deviation of the residuals: 
35; -17; 16; -6; -19; 9; 3; -1; -10; -9; -1. 
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X 


y 


A 

y 


A 

y-y 


65 


175 


140 


175 - 140 = 35 


67 


133 


150 


133-150= -17 


71 


185 


169 


185 - 169 = 16 


71 


163 


169 


163 - 169 = -6 


66 


126 


145 


126 - 145 = -19 


75 


198 


189 


198 - 189 = 9 


67 


153 


150 


153 - 150 = 3 


70 


163 


164 


163 - 164 = -1 


71 


159 


169 


159-169= -10 


69 


151 


160 


151 - 160 = -9 


69 


159 


160 


159 - 160 = -1 



Table 3.1 

We are looking for all data points for which the residual is greater than 2s=2(16.4)=32.8 or less than 
-32.8. Compare these values to the residuals in column 4 of the table. The only such data point is 
the student who had a grade of 65 on the third exam and 175 on the final exam; the residual for 
this student is 35. 

How does the outlier affect the best fit line? 

Numerically and graphically, we have identified the point (65,175) as an outlier. We should re- 
examine the data for this point to see if there are any problems with the data. If there is an error 
we should fix the error if possible, or delete the data. If the data is correct, we would leave it in 
the data set. For this problem, we will suppose that we examined the data and found that this 
outlier data was an error. Therefore we will continue on to delete the outlier, so that we can 
explore how it affects the results, as a learning experience. 

Compute a new best-fit line and correlation coefficient using the 10 remaining points: 

On the TI-83, TI-83+, TI-84+ calculators, delete the outlier from LI and L2. Using the LinRegTTest, 
the new line of best fit and the correlation coefficient are: 



A 

y-- 



-355.19 + 7.39x and r = 0.9121 



The new line with r = 0.9121 is a stronger correlation than the original (r=0.6631) because r = 
0.9121 is closer to 1. This means that the new line is a better fit to the 10 remaining data values. 
The line can better predict the final exam score given the third exam score. 



Numerical Identification of Outliers: Calculating s and Finding Outliers Manually 

If you do not have the function LinRegTTest, then you can calculate the outlier in the first example by 
doing the following. 



First, square each \y— y | (See the TABLE above): 
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The squares are 35 2 ; 17 2 ; 16 2 ; 6 2 ; 19 2 ; 9 2 ; 3 2 ; l 2 ; 10 2 ; 9 2 ; l 2 

A 

Then, add (sum) all the \y— y | squared terms using the formula 

.^ f \ Vi - Vi\ J =.E i e,- 2 (Recall that |y,- - y,-| = e ; .) 

= 35 2 + 17 2 + 16 2 + 6 2 + 19 2 + 9 2 + 3 2 + l 2 + 10 2 + 9 2 + l 2 
= 2440 = SSE. The result, SSE is the Sum of Squared Errors. 

A 

Next, calculate s, the standard deviation of all the \y— y \ — e values where n = the total number of data 
points. (Calculate the standard deviation of 35; 17; 16; 6; 19; 9; 3; 1; 10; 9; 1.) 

The calculation is s = J |^ 



For the third exam/final exam problem, s = JJ^ = 16.47 

Next, multiply s by 1.9: 
(1.9) • (16.47) = 31.29 

A 

31.29 is almost 2 standard deviations away from the mean of the \y— V | values. 

If we were to measure the vertical distance from any data point to the corresponding point on the line of 
best fit and that distance is at least 1.9s, then we would consider the data point to be "too far" from the line 
of best fit. We call that point a potential outlier. 

A 

For the example, if any of the \y— y | values are at least 31.29, the corresponding (x,y) data point is a 
potential outlier. 

A 

For the third exam/final exam problem, all the \y— V |'s are less than 31.29 except for the first one which is 
35. 

A 

35 > 31.29 That is, \y- y \ > (1.9) • (s) 

A 

The point which corresponds to \y— y \ = 35 is (65, 175). Therefore, the data point (65, 175) is a potential 
outlier. For this example, we will delete it. (Remember, we do not always delete an outlier.) 

The next step is to compute a new best-fit line using the 10 remaining points. The new line of best 
fit and the correlation coefficient are: 

A 

y= -355.19 + 7.39x and r = 0.9121 

Example 3.13 

Using this new line of best fit (based on the remaining 10 data points), what would a student 
who receives a 73 on the third exam expect to receive on the final exam? Is this the same as the 
prediction made using the original line? 
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Solution 

A 

Using the new line of best fit, J/= —355.19 + 7.39(73) = 184.28. A student who scored 73 points on 
the third exam would expect to earn 184 points on the final exam. 

A 

The original line predicted J/= —173.51 + 4.83(73) = 179.08 so the prediction using the new 
line with the outlier eliminated differs from the original prediction. 



Example 3.14 

(From The Consumer Price Indexes Web site) The Consumer Price Index (CPI) measures the aver- 
age change over time in the prices paid by urban consumers for consumer goods and services. The 
CPI affects nearly all Americans because of the many ways it is used. One of its biggest uses is as 
a measure of inflation. By providing information about price changes in the Nation's economy to 
government, business, and labor, the CPI helps them to make economic decisions. The President, 
Congress, and the Federal Reserve Board use the CPI's trends to formulate monetary and fiscal 
policies. In the following table, x is the year and y is the CPI. 

Data: 



X 


y 


1915 


10.1 


1926 


17.7 


1935 


13.7 


1940 


14.7 


1947 


24.1 


1952 


26.5 


1964 


31.0 


1969 


36.7 


1975 


49.3 


1979 


72.6 


1980 


82.4 


1986 


109.6 


1991 


130.7 


1999 


166.6 



Table 3.2 



Problem 



Make a scatterplot of the data. 

A 

Calculate the least squares line. Write the equation in the form J/= a + bx. 

Draw the line on the scatterplot. 

Find the correlation coefficient. Is it significant? 

What is the average CPI for the year 1990? 
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Solution 



Scatter plot and line of best fit. 

A 

y— —3204 + 1.662x is the equation of the line of best fit. 

r = 0.8694 

The number of data points is n = 14. Use the 95% Critical Values of the Sample Correlation 

Coefficient table at the end of Chapter 12. n — 2 — 12. The corresponding critical value is 

0.532. Since 0.8694 > 0.532, r is significant. 

A 

y= -3204 + 1.662 (1990) = 103.4 CPI 

Using the calculator LinRegTTest, we find that s = 25.4 ; graphing the lines Y2=-3204+1.662X- 
2(25.4) and Y3=-3204+1.662X+2(25.4) shows that no data values are outside those lines, iden- 
tifying no outliers. (Note that the year 1999 was very close to the upper line, but still inside 
it.) 



200 



a, 

Li 




1900 191 1 1922 1933 1944 1955 1966 1977 1988 19992010 
Year 



Figure 3.17 



NOTE: In the example, notice the pattern of the points compared to the line. Although the correla- 
tion coefficient is significant, the pattern in the scatterplot indicates that a curve would be a more 
appropriate model to use than a line. In this example, a statistician should prefer to use other 
methods to fit a curve to this data, rather than model the data with the line we found. In addition 
to doing the calculations, it is always important to look at the scatterplot when deciding whether 
a linear model is appropriate. 

If you are interested in seeing more years of data, visit the Bureau of Labor Statistics CPI website 
ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt ; our data is taken from the column entitled 
"Annual Avg." (third column from the right). For example you could add more current years of 
data. Try adding the more recent years 2004 : CPI=188.9 and 2008 : CPI=215.3 and see how it 
affects the model. 

**With contributions from Roberta Bloom 
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3.10 95% Critical Values of the Sample Correlation Coefficient Table 10 



"This content is available online at <http://cnx.org/content/ml7098/1.5/>. 
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Degrees of Freedom: n — 1 


Critical Values: (+ and - 


) 


1 


0.997 


2 


0.950 


3 


0.878 


4 


0.811 


5 


0.754 


6 


0.707 


7 


0.666 


8 


0.632 


9 


0.602 


10 


0.576 


11 


0.555 


12 


0.532 


13 


0.514 


14 


0.497 


15 


0.482 


16 


0.468 


17 


0.456 


18 


0.444 


19 


0.433 


20 


0.423 


21 


0.413 


22 


0.404 


23 


0.396 


24 


0.388 


25 


0.381 


26 


0.374 


continued on next page 
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27 


0.367 


28 


0.361 


29 


0.355 


30 


0.349 


40 


0.304 


50 


0.273 


60 


0.250 


70 


0.232 


80 


0.217 


90 


0.205 


100 and over 


0.195 



Table 3.3 
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3.11 Summary 

Bivariate Data: Each data point has two values. The form is (x,y). 

A 

Line of Best Fit or Least Squares Line (LSL): y= a + bx 

x = independent variable; y = dependent variable 

A 

Residual: Actual y value — predicted y value = y— V 
Correlation Coefficient r: 

1. Used to determine whether a line of best fit is good for prediction. 

2. Between -1 and 1 inclusive. The closer r is to 1 or -1, the closer the original points are to a straight line. 

3. If r is negative, the slope is negative. If r is positive, the slope is positive. 

4. If r — 0, then the line is horizontal. 

Sum of Squared Errors (SSE): The smaller the SSE, the better the original set of points fits the line of best 
fit. 

Outlier: A point that does not seem to fit the rest of the data. 



lr rhis content is available online at <http://cnx.org/content/ml7081/1.4/>. 
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3.12 Practice: Linear Regression 12 
3.12.1 Student Learning Outcomes 

• The student will explore the properties of linear regression. 



3.12.2 Given 

The data below are real. Keep in mind that these are only reported figures. (Source: Centers for Disease 
Control and Prevention, National Center for HIV, STD, and TB Prevention, October 24, 2003) 

Adults and Adolescents only, United States 



Year 


# AIDS cases diagnosed 


# AIDS deaths 


Pre-1981 


91 


29 


1981 


319 


121 


1982 


1,170 


453 


1983 


3,076 


1,482 


1984 


6,240 


3,466 


1985 


11,776 


6,878 


1986 


19,032 


11,987 


1987 


28,564 


16,162 


1988 


35,447 


20,868 


1989 


42,674 


27,591 


1990 


48,634 


31,335 


1991 


59,660 


36,560 


1992 


78,530 


41,055 


1993 


78,834 


44,730 


1994 


71,874 


49,095 


1995 


68,505 


49,456 


1996 


59,347 


38,510 


1997 


47,149 


20,736 


1998 


38,393 


19,005 


1999 


25,174 


18,454 


2000 


25,522 


17,347 


2001 


25,643 


17,402 


2002 


26,464 


16,371 


Total 


802,118 


489,093 



Table 3.4 



2 This content is available online at <http://cnx.Org/content/ml7088/l.8/>. 
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NOTE: We will use the columns "year" and "# AIDS cases diagnosed" for all questions unless 
otherwise stated. 



3.12.3 Graphing 

Graph "year" vs. "# AIDS cases diagnosed." Plot the points on the graph located below in the section 
titled "Plot" . Do not include pre-1981. Label both axes with words. Scale both axes. 

3.12.4 Data 

Exercise 3.12.1 

Enter your data into your calculator or computer. The pre-1981 data should not be included. Why 
is that so? 



3.12.5 Linear Equation 

Write the linear equation below, rounding to 4 decimal places: 

Exercise 3.12.2 (Solution on p. 152.) 

Calculate the following: 

a. a — 

b.b = 

c. corr. = 

d. n =(# of pairs) 

Exercise 3.12.3 (Solution on p. 152.) 

A 

equation: J/= 

3.12.6 Solve 

Exercise 3.12.4 (Solution on p. 152.) 

Solve. 

A 

a. When x = 1985, V= 

A 

b. When x = 1990, y= 



3.12.7 Plot 

Plot the 2 above points on the graph below. Then, connect the 2 points to form the regression line. 
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Obtain the graph on your calculator or computer. 

3.12.8 Discussion Questions 

Look at the graph above. 

Exercise 3.12.5 

Does the line seem to fit the data? Why or why not? 

Exercise 3.12.6 

Do you think a linear fit is best? Why or why not? 

Exercise 3.12.7 

Hand draw a smooth curve on the graph above that shows the flow of the data. 

Exercise 3.12.8 

What does the correlation imply about the relationship between time (years) and the number of 
diagnosed AIDS cases reported in the U.S.? 

Exercise 3.12.9 

Why is "year" the independent variable and "# AIDS cases diagnosed." the dependent variable 

(instead of the reverse)? 

Exercise 3.12.10 (Solution on p. 152.) 

Solve. 



a. When x = 1970, V=: 

b. Why doesn't this answer make sense? 
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13 



Exercise 3.13.1 (Solution on p. 152.) 

For each situation below, state the independent variable and the dependent variable. 

a. A study is done to determine if elderly drivers are involved in more motor vehicle fatalities 

than all other drivers. The number of fatalities per 100,000 drivers is compared to the age of 
drivers. 

b. A study is done to determine if the weekly grocery bill changes based on the number of family 

members. 

c. Insurance companies base life insurance premiums partially on the age of the applicant. 

d. Utility bills vary according to power consumption. 

e. A study is done to determine if a higher education reduces the crime rate in a population. 

Exercise 3.13.2 

In 1990 the number of driver deaths per 100,000 for the different age groups was as follows 
(Source: The National Highway Traffic Safety Administration's National Center for Statistics and 
Analysis): 



Age 


Number of Driver Deaths per 100,000 


15-24 


28 


25-39 


15 


40-69 


10 


70-79 


15 


80+ 


25 



Table 3.5 



a. For each age group, pick the midpoint of the interval for the x value. (For the 80+ group, use 

85.) 

b. Using "ages" as the independent variable and "Number of driver deaths per 100,000" as the 

dependent variable, make a scatter plot of the data. 

A 

c. Calculate the least squares (best-fit) line. Put the equation in the form of: J/= a + bx 

d. Find the correlation coefficient. Is it significant? 

e. Pick two ages and find the estimated fatality rates. 

f. Use the two points in (e) to plot the least squares line on your graph from (b). 

g. Based on the above data, is there a linear relationship between age of a driver and driver fatality 

rate? 
h. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.3 (Solution on p. 152.) 

The average number of people in a family that received welfare for various years is given below. 
(Source: House Ways and Means Committee, Health and Human Services Department) 



3 This content is available online at <http://cnx.Org/content/ml7085/l.13/>. 
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Year 


Welfare family size 


1969 


4.0 


1973 


3.6 


1975 


3.2 


1979 


3.0 


1983 


3.0 


1988 


3.0 


1991 


2.9 



Table 3.6 



a. Using "year " as the independent variable and "welfare family size" as the dependent variable, 

make a scatter plot of the data. 

A 

b. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

c. Find the correlation coefficient. Is it significant? 

d. Pick two years between 1969 and 1991 and find the estimated welfare family sizes. 

e. Use the two points in (d) to plot the least squares line on your graph from (b). 

f . Based on the above data, is there a linear relationship between the year and the average number 

of people in a welfare family? 

g. Using the least squares line, estimate the welfare family sizes for 1960 and 1995. Does the least 

squares line give an accurate estimate for those years? Explain why or why not. 
h. Are there any outliers in the above data? 
i. What is the estimated average welfare family size for 1986? Does the least squares line give an 

accurate estimate for that year? Explain why or why not. 
j. What is the slope of the least squares (best-fit) line? Interpret the slope. 



Exercise 3.13.4 

Use the AIDS data from the practice for this section (Section 3.12.2: Given), but this time use the 
columns "year #" and "# new AIDS deaths in U.S." Answer all of the questions from the practice 
again, using the new columns. 

Exercise 3.13.5 (Solution on p. 152.) 

The height (sidewalk to roof) of notable tall buildings in America is compared to the number of 
stories of the building (beginning at street level). (Source: Microsoft Bookshelf) 
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Height (in feet) 


Stories 


1050 


57 


428 


28 


362 


26 


529 


40 


790 


60 


401 


22 


380 


38 


1454 


110 


1127 


100 


700 


46 



Table 3.7 



a. Using "stories" as the independent variable and "height" as the dependent variable, make a 

scatter plot of the data. 

b. Does it appear from inspection that there is a relationship between the variables? 

A 

c. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

d. Find the correlation coefficient. Is it significant? 

e. Find the estimated heights for 32 stories and for 94 stories. 

f. Use the two points in (e) to plot the least squares line on your graph from (b). 

g. Based on the above data, is there a linear relationship between the number of stories in tall 

buildings and the height of the buildings? 
h. Are there any outliers in the above data? If so, which point(s)? 
i. What is the estimated height of a building with 6 stories? Does the least squares line give an 

accurate estimate of height? Explain why or why not. 
j. Based on the least squares line, adding an extra story adds about how many feet to a building? 
k. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.6 

Below is the life expectancy for an individual born in the United States in certain years. (Source: 
National Center for Health Statistics) 



Year of Birth 


Life Expectancy 


1930 


59.7 


1940 


62.9 


1950 


70.2 


1965 


69.7 


1973 


71.4 


1982 


74.5 


1987 


75 


1992 


75.7 
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Table 3.8 



a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Draw a scatter plot of the ordered pairs. 

A 

c. Calculate the least squares line. Put the equation in the form of: V= a + bx 

d. Find the correlation coefficient. Is it significant? 

e. Find the estimated life expectancy for an individual born in 1950 and for one born in 1982. 

f. Why aren't the answers to part (e) the values on the above chart that correspond to those years? 

g. Use the two points in (e) to plot the least squares line on your graph from (b). 

h. Based on the above data, is there a linear relationship between the year of birth and life ex- 
pectancy? 

i. Are there any outliers in the above data? 

j. Using the least squares line, find the estimated life expectancy for an individual born in 1850. 
Does the least squares line give an accurate estimate for that year? Explain why or why not. 

k. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.7 (Solution on p. 153.) 

The percent of female wage and salary workers who are paid hourly rates is given below for the 
years 1979 - 1992. (Source: Bureau of Labor Statistics, U.S. Dept. of Labor) 



Year 


Percent of workers paid hourly rates 


1979 


61.2 


1980 


60.7 


1981 


61.3 


1982 


61.3 


1983 


61.8 


1984 


61.7 


1985 


61.8 


1986 


62.0 


1987 


62.7 


1990 


62.8 


1992 


62.9 



Table 3.9 



a. Using "year" as the independent variable and "percent" as the dependent variable, make a 

scatter plot of the data. 

b. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

c. Calculate the least squares line. Put the equation in the form of: 3/ = a + bx 

d. Find the correlation coefficient. Is it significant? 

e. Find the estimated percents for 1991 and 1988. 

f. Use the two points in (e) to plot the least squares line on your graph from (b). 
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, Based on the above data, is there a linear relationship between the year and the percent of 

female wage and salary earners who are paid hourly rates? 
, Are there any outliers in the above data? 
What is the estimated percent for the year 2050? Does the least squares line give an accurate 

estimate for that year? Explain why or why not? 
What is the slope of the least squares (best-fit) line? Interpret the slope. 



Exercise 3.13.8 

The maximum discount value of the Entertainment® card for the "Fine Dining" section, Edition 
10, for various pages is given below. 



Page number 


Maximum value ($) 


4 


16 


14 


19 


25 


15 


32 


17 


43 


19 


57 


15 


72 


16 


85 


15 


90 


17 



Table 3.10 

a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Draw a scatter plot of the ordered pairs. 

A 

c. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

d. Find the correlation coefficient. Is it significant? 

e. Find the estimated maximum values for the restaurants on page 10 and on page 70. 

f. Use the two points in (e) to plot the least squares line on your graph from (b). 

g. Does it appear that the restaurants giving the maximum value are placed in the beginning of 

the "Fine Dining" section? How did you arrive at your answer? 
h. Suppose that there were 200 pages of restaurants. What do you estimate to be the maximum 

value for a restaurant listed on page 200? 
i. Is the least squares line valid for page 200? Why or why not? 
j. What is the slope of the least squares (best-fit) line? Interpret the slope. 

The next two questions refer to the following data: The cost of a leading liquid laundry detergent in 
different sizes is given below. 



Size (ounces) 


Cost ($) 


Cost per ounce 


16 


3.99 




32 


4.99 




64 


5.99 




200 


10.99 
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Table 3.11 



Exercise 3.13.9 



(Solution on p. 153.) 



a. Using "size" as the independent variable and "cost" as the dependent variable, make a scatter 

plot. 

b. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

c. Calculate the least squares line. Put the equation in the form of: J/ = a + bx 

d. Find the correlation coefficient. Is it significant? 

e. If the laundry detergent were sold in a 40 ounce size, find the estimated cost. 

f . If the laundry detergent were sold in a 90 ounce size, find the estimated cost. 

g. Use the two points in (e) and (f) to plot the least squares line on your graph from (a). 
h. Does it appear that a line is the best way to fit the data? Why or why not? 

i. Are there any outliers in the above data? 

j. Is the least squares line valid for predicting what a 300 ounce size of the laundry detergent 

would cost? Why or why not? 
k. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.10 

a. Complete the above table for the cost per ounce of the different sizes. 

b. Using "Size" as the independent variable and "Cost per ounce" as the dependent variable, 

make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: y= a + bx 

e. Find the correlation coefficient. Is it significant? 

f. If the laundry detergent were sold in a 40 ounce size, find the estimated cost per ounce. 

g. If the laundry detergent were sold in a 90 ounce size, find the estimated cost per ounce. 
h. Use the two points in (f) and (g) to plot the least squares line on your graph from (b). 

i. Does it appear that a line is the best way to fit the data? Why or why not? 

j. Are there any outliers in the above data? 

k. Is the least squares line valid for predicting what a 300 ounce size of the laundry detergent 

would cost per ounce? Why or why not? 
1. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.11 (Solution on p. 153.) 

According to flyer by a Prudential Insurance Company representative, the costs of approximate 
probate fees and taxes for selected net taxable estates are as follows: 



Net Taxable Estate ($) 


Approximate Probate Fees and Taxes ($) 


600,000 


30,000 


750,000 


92,500 


1,000,000 


203,000 


1,500,000 


438,000 


2,000,000 


688,000 


2,500,000 


1,037,000 


3,000,000 


1,350,000 
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a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

e. Find the correlation coefficient. Is it significant? 

f. Find the estimated total cost for a net taxable estate of $1,000,000. Find the cost for $2,500,000. 

g. Use the two points in (f) to plot the least squares line on your graph from (b). 
h. Does it appear that a line is the best way to fit the data? Why or why not? 

i. Are there any outliers in the above data? 

j. Based on the above, what would be the probate fees and taxes for an estate that does not have 

any assets? 
k. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.12 

The following are advertised sale prices of color televisions at Anderson's. 



Size (inches) 


Sale Price ($) 


9 


147 


20 


197 


27 


297 


31 


447 


35 


1177 


40 


2177 


60 


2497 



Table 3.13 

a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: y= a + bx 

e. Find the correlation coefficient. Is it significant? 

f. Find the estimated sale price for a 32 inch television. Find the cost for a 50 inch television. 

g. Use the two points in (f) to plot the least squares line on your graph from (b). 
h. Does it appear that a line is the best way to fit the data? Why or why not? 

i. Are there any outliers in the above data? 

j. What is the slope of the least squares (best-fit) line? Interpret the slope. 



Exercise 3.13.13 (Solution on p. 153.) 

Below are the average heights for American boys. (Source: Physician's Handbook, 1990) 
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Age (years) 


Height (cm) 


birth 


50.8 


2 


83.8 


3 


91.4 


5 


106.6 


7 


119.3 


10 


137.1 


14 


157.5 



Table 3.14 

a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: y= a + bx 

e. Find the correlation coefficient. Is it significant? 

f. Find the estimated average height for a one year-old. Find the estimated average height for an 

eleven year-old. 

g. Use the two points in (f) to plot the least squares line on your graph from (b). 
h. Does it appear that a line is the best way to fit the data? Why or why not? 

i. Are there any outliers in the above data? 

j. Use the least squares line to estimate the average height for a sixty-two year-old man. Do you 

think that your answer is reasonable? Why or why not? 
k. What is the slope of the least squares (best-fit) line? Interpret the slope. 

Exercise 3.13.14 

The following chart gives the gold medal times for every other Summer Olympics for the women's 
100 meter freestyle (swimming). 



Year 


Time (seconds) 


1912 


82.2 


1924 


72.4 


1932 


66.8 


1952 


66.8 


1960 


61.2 


1968 


60.0 


1976 


55.65 


1984 


55.92 


1992 


54.64 



Table 3.15 
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a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: y= a + bx 

e. Find the correlation coefficient. Is the decrease in times significant? 

f. Find the estimated gold medal time for 1932. Find the estimated time for 1984. 

g. Why are the answers from (f ) different from the chart values? 

h. Use the two points in (f) to plot the least squares line on your graph from (b). 
i. Does it appear that a line is the best way to fit the data? Why or why not? 

j. Use the least squares line to estimate the gold medal time for the next Summer Olympics. Do 
you think that your answer is reasonable? Why or why not? 

The next three questions use the following state information. 



State 


# letters in name 


Year entered the 
Union 


Rank for entering 
the Union 


Area (square 
miles) 


Alabama 


7 


1819 


22 


52,423 


Colorado 




1876 


38 


104,100 


Hawaii 




1959 


50 


10,932 


Iowa 




1846 


29 


56,276 


Maryland 




1788 


7 


12,407 


Missouri 




1821 


24 


69,709 


New Jersey 




1787 


3 


8,722 


Ohio 




1803 


17 


44,828 


South Carolina 


13 


1788 


8 


32,008 


Utah 




1896 


45 


84,904 


Wisconsin 




1848 


30 


65,499 



Table 3.16 

Exercise 3.13.15 (Solution on p. 153.) 

We are interested in whether or not the number of letters in a state name depends upon the year 
the state entered the Union. 



a. Decide which variable should be the independent variable and which should be the dependent 

variable. 

b. Make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

e. Find the correlation coefficient. What does it imply about the significance of the relationship? 

f. Find the estimated number of letters (to the nearest integer) a state would have if it entered 

the Union in 1900. Find the estimated number of letters a state would have if it entered the 
Union in 1940. 
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g. Use the two points in (f) to plot the least squares line on your graph from (b). 
h. Does it appear that a line is the best way to fit the data? Why or why not? 

i. Use the least squares line to estimate the number of letters a new state that enters the Union this 
year would have. Can the least squares line be used to predict it? Why or why not? 

Exercise 3.13.16 

We are interested in whether there is a relationship between the ranking of a state and the area of 
the state. 

a. Let rank be the independent variable and area be the dependent variable. 

b. What do you think the scatter plot will look like? Make a scatter plot of the data. 

c. Does it appear from inspection that there is a relationship between the variables? Why or why 

not? 

A 

d. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

e. Find the correlation coefficient. What does it imply about the significance of the relationship? 

f. Find the estimated areas for Alabama and for Colorado. Are they close to the actual areas? 

g. Use the two points in (f) to plot the least squares line on your graph from (b). 
h. Does it appear that a line is the best way to fit the data? Why or why not? 

i. Are there any outliers? 

j. Use the least squares line to estimate the area of a new state that enters the Union. Can the least 

squares line be used to predict it? Why or why not? 
k. Delete "Hawaii" and substitute "Alaska" for it. Alaska is the fortieth state with an area of 

656,424 square miles. 
1. Calculate the new least squares line. 
m. Find the estimated area for Alabama. Is it closer to the actual area with this new least squares 

line or with the previous one that included Hawaii? Why do you think that's the case? 
n. Do you think that, in general, newer states are larger than the original states? 

Exercise 3.13.17 (Solution on p. 154.) 

We are interested in whether there is a relationship between the rank of a state and the year it 
entered the Union. 

a. Let year be the independent variable and rank be the dependent variable. 

b. What do you think the scatter plot will look like? Make a scatter plot of the data. 

c. Why must the relationship be positive between the variables? 

A 

d. Calculate the least squares line. Put the equation in the form of: y= a + bx 

e. Find the correlation coefficient. What does it imply about the significance of the relationship? 

f. Let's say a fifty-first state entered the union. Based upon the least squares line, when should 

that have occurred? 

g. Using the least squares line, how many states do we currently have? 
h. Why isn't the least squares line a good estimator for this year? 

Exercise 3.13.18 

Below are the percents of the U.S. labor force (excluding self-employed and unemployed ) that 
are members of a union. We are interested in whether the decrease is significant. (Source: Bureau 
of Labor Statistics, U.S. Dept. of Labor) 
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Year 


Percent 


1945 


35.5 


1950 


31.5 


1960 


31.4 


1970 


27.3 


1980 


21.9 


1986 


17.5 


1993 


15.8 



Table 3.17 



a. Let year be the independent variable and percent be the dependent variable. 

b. What do you think the scatter plot will look like? Make a scatter plot of the data. 

c. Why will the relationship between the variables be negative? 

A 

d. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

e. Find the correlation coefficient. What does it imply about the significance of the relationship? 

f. Based on your answer to (e), do you think that the relationship can be said to be decreasing? 

g. If the trend continues, when will there no longer be any union members? Do you think that 

will happen? 

The next two questions refer to the following information: The data below reflects the 1991-92 Reunion 
Class Giving. (Source: SUNY Albany alumni magazine) 



Class Year 


Average Gift 


Total Giving 


1922 


41.67 


125 


1927 


60.75 


1,215 


1932 


83.82 


3,772 


1937 


87.84 


5,710 


1947 


88.27 


6,003 


1952 


76.14 


5,254 


1957 


52.29 


4,393 


1962 


57.80 


4,451 


1972 


42.68 


18,093 


1976 


49.39 


22,473 


1981 


46.87 


20,997 


1986 


37.03 


12,590 



Table 3.18 



Exercise 3.13.19 (Solution on p. 154.) 

We will use the columns "class year" and "total giving" for all questions, unless otherwise stated. 
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a. What do you think the scatter plot will look like? Make a scatter plot of the data. 

A 

b. Calculate the least squares line. Put the equation in the form of: y= a + bx 

c. Find the correlation coefficient. What does it imply about the significance of the relationship? 

d. For the class of 1930, predict the total class gift. 

e. For the class of 1964, predict the total class gift. 

f. For the class of 1850, predict the total class gift. Why doesn't this value make any sense? 

Exercise 3.13.20 

We will use the columns "class year" and "average gift" for all questions, unless otherwise stated. 

a. What do you think the scatter plot will look like? Make a scatter plot of the data. 

A 

b. Calculate the least squares line. Put the equation in the form of: J/= a + bx 

c. Find the correlation coefficient. What does it imply about the significance of the relationship? 

d. For the class of 1930, predict the average class gift. 

e. For the class of 1964, predict the average class gift. 

f. For the class of 2010, predict the average class gift. Why doesn't this value make any sense? 

Exercise 3.13.21 (Solution on p. 154.) 

We are interested in exploring the relationship between the weight of a vehicle and its fuel effi- 
ciency (gasoline mileage). The data in the table show the weights, in pounds, and fuel efficiency, 
measured in miles per gallon, for a sample of 12 vehicles. 



Weight 


Fuel Efficiency 


2715 


24 


2570 


28 


2610 


29 


2750 


38 


3000 


25 


3410 


22 


3640 


20 


3700 


26 


3880 


21 


3900 


18 


4060 


18 


4710 


15 



Table 3.19 



a. Graph a scatterplot of the data. 

b. Find the correlation coefficient and determine if it is significant. 

c. Find the equation of the best fit line. 

d. Write the sentence that interprets the meaning of the slope of the line in the context of the data. 

e. What percent of the variation in fuel efficiency is explained by the variation in the weight of the 

vehicles, using the regression line? (State your answer in a complete sentence in the context 
of the data.) 
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f. Accurately graph the best fit line on your scatterplot. 

g. For the vehicle that weights 3000 pounds, find the residual (y-yhat). Does the value predicted 

by the line underestimate or overestimate the observed data value? 
h. Identify any outliers, using either the graphical or numerical procedure demonstrated in the 

textbook. 
i. The outlier is a hybrid car that runs on gasoline and electric technology, but all other vehicles 

in the sample have engines that use gasoline only. Explain why it would be appropriate to 

remove the outlier from the data in this situation. Remove the outlier from the sample data. 

Find the new correlation coefficient, coefficient of determination, and best fit line. 
j. Compare the correlation coefficients and coefficients of determination before and after removing 

the outlier, and explain in complete sentences what these numbers indicate about how the 

model has changed. 

Exercise 3.13.22 (Solution on p. 154.) 

The four data sets below were created by statistician Francis Anscomb. They show why it is im- 
portant to examine the scatterplots for your data, in addition to finding the correlation coefficient, 
in order to evaluate the appropriateness of fitting a linear model. 



Setl 






Set 2 






Set 3 






Set 4 




X 


y 




X 


y 




X 


y 




X 


y 


10 


8.04 




10 


9.14 




10 


7.46 




8 


6.58 


8 


6.95 




8 


8.14 




8 


6.77 




8 


5.76 


13 


7.58 




13 


8.74 




13 


12.74 




8 


7.71 


9 


8.81 




9 


8.77 




9 


7.11 




8 


8.84 


11 


8.33 




11 


9.26 




11 


7.81 




8 


8.47 


14 


9.96 




14 


8.10 




14 


8.84 




8 


7.04 


6 


7.24 




6 


6.13 




6 


6.08 




8 


5.25 


4 


4.26 




4 


3.10 




4 


5.39 




19 


12.50 


12 


10.84 




12 


9.13 




12 


8.15 




8 


5.56 


7 


4.82 




7 


7.26 




7 


6.42 




8 


7.91 


5 


5.68 




5 


4.74 




5 


5.73 




8 


6.89 



Table 3.20 



a. For each data set, find the least squares regression line and the correlation coefficient. What did 
you discover about the lines and values of r? 

For each data set, create a scatter plot and graph the least squares regression line. Use the graphs 
to answer the following questions: 

b. For which data set does it appear that a curve would be a more appropriate model than a line? 

c. Which data set has an influential point (point close to or on the line that greatly influences the 

best fit line)? 

d. Which data set has an outlier (obviously visible on the scatter plot with best fit line graphed)? 

e. Which data set appears to be the most appropriate to model using the least squares regression 

line? 
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3.13.1 Try these multiple choice questions 

Exercise 3.13.23 

A correlation coefficient of -0.95 means there is a 



(Solution on p. 155.) 

between the two variables. 



A. Strong positive correlation 

B. Weak negative correlation 

C. Strong negative correlation 

D. No Correlation 

Exercise 3.13.24 (Solution on p. 155.) 

According to the data reported by the New York State Department of Health regarding West Nile 
Virus (http://www.health.state.ny.us/nysdoh/westnile/update/update.htm) for the years 2000- 
2008, the least squares line equation for the number of reported dead birds (x) versus the number 

A 

of human West Nile virus cases (y) is J/= —10.2638 + 0.0491x. If the number of dead birds reported 
in a year is 732, how many human cases of West Nile virus can be expected? r = 0.5490 

A. No prediction can be made. 

B. 19.6 

C. 15 

D. 38.1 

The next three questions refer to the following data: (showing the number of hurricanes by category to 
directly strike the mainland U.S. each decade) obtained from www.nhc.noaa.gov/gifs/table6.gif 14: A major 
hurricane is one with a strength rating of 3, 4 or 5. 



Decade 


Total Number of Hurricanes 


Number of Major Hurricanes 


1941-1950 


24 


10 


1951-1960 


17 


8 


1961-1970 


14 


6 


1971-1980 


12 


4 


1981-1990 


15 


5 


1991-2000 


14 


5 


2001 - 2004 


9 


3 



Table 3.21 

Exercise 3.13.25 (Solution on p. 155.) 

Using only completed decades (1941 - 2000), calculate the least squares line for the number of 
major hurricanes expected based upon the total number of hurricanes. 



A. y= -1.67x + 0.5 

A 

B. y= 0.5x - 1.67 

A 

C. y= 0.94a: - 1.67 



D. y-. 



-2x + \ 



4 http://www.nhc.noaa.gov/gifs/table6.gif 
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Exercise 3.13.26 (Solution on p. 155.) 

The correlation coefficient is 0.942. Is this considered significant? Why or why not? 

A. No, because 0.942 is greater than the critical value of 0.707 

B. Yes, because 0.942 is greater than the critical value of 0.707 

C. No, because 0942 is greater than the critical value of 0.811 

D. Yes, because 0.942 is greater than the critical value of 0.811 

Exercise 3.13.27 (Solution on p. 155.) 

The data for 2001-2004 show 9 hurricanes have hit the mainland United States. The line of best fit 
predicts 2.83 major hurricanes to hit mainland U.S. Can the least squares line be used to make this 
prediction? 

A. No, because 9 lies outside the independent variable values 

B. Yes, because, in fact, there have been 3 major hurricanes this decade 

C. No, because 2.83 lies outside the dependent variable values 

D. Yes, because how else could we predict what is going to happen this decade. 

^Exercises 21 and 22 contributed by Roberta Bloom 
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3.14 Lab 1: Regression (Distance from School) 15 

Class Time: 
Names: 

3.14.1 Student Learning Outcomes: 

• The student will calculate and construct the line of best fit between two variables. 

• The student will evaluate the relationship between two variables to determine if that relationship is 
significant. 



3.14.2 Collect the Data 

Use 8 members of your class for the sample. Collect bivariate data (distance an individual lives from school, 
the cost of supplies for the current term). 

1. Complete the table. 



Distance from school 


Cost of supplies this term 



































Table 3.22 

2. Which variable should be the dependent variable and which should be the independent variable? 
Why? 

3. Graph "distance" vs. "cost." Plot the points on the graph. Label both axes with words. Scale both 
axes. 



5 This content is available online at <http://cnx.Org/content/ml7080/l.10/>. 
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Figure 3.18 



3.14.3 Analyze the Data 

Enter your data into your calculator or computer. Write the linear equation below, rounding to 4 decimal 
places. 

1. Calculate the following: 

a. a = 

b. b= 

c. correlation = 

d. n = 

A 

e. equation: y = 

f. Is the correlation significant? Why or why not? (Answer in 1-3 complete sentences.) 

2. Supply an answer for the following senarios: 

a. For a person who lives 8 miles from campus, predict the total cost of supplies this term: 

b. For a person who lives 80 miles from campus, predict the total cost of supplies this term: 

3. Obtain the graph on your calculator or computer. Sketch the regression line below. 
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Figure 3.19 



3.14.4 Discussion Questions 

1. Answer each with 1-3 complete sentences. 

a. Does the line seem to fit the data? Why? 

b. What does the correlation imply about the relationship between the distance and the cost? 

2. Are there any outliers? If so, which point is an outlier? 

3. Should the outlier, if it exists, be removed? Why or why not? 
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3.15 Lab 2: Regression (Textbook Cost) 

Class Time: 
Names: 



16 



3.15.1 Student Learning Outcomes: 

• The student will calculate and construct the line of best fit between two variables. 

• The student will evaluate the relationship between two variables to determine if that relationship is 
significant. 



3.15.2 Collect the Data 

Survey 10 textbooks. Collect bivariate data (number of pages in a textbook, the cost of the textbook). 

1. Complete the table. 



Number of pages 


Cost of textbook 



































Table 3.23 

2. Which variable should be the dependent variable and which should be the independent variable? 
Why? 

3. Graph "distance" vs. "cost." Plot the points on the graph in "Analyze the Data". Label both axes with 
words. Scale both axes. 



3.15.3 Analyze the Data 

Enter your data into your calculator or computer. Write the linear equation below, rounding to 4 decimal 
places. 

1. Calculate the following: 

a. a = 
b.b = 

c. correlation = 

d. n = 



6 This content is available online at <http://cnx.Org/content/ml7087/l.9/>. 
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e. equation: y = 

f. Is the correlation significant? Why or why not? (Answer in 1-3 complete sentences.) 

Supply an answer for the following senarios: 

a. For a textbook with 400 pages, predict the cost: 

b. For a textbook with 600 pages, predict the cost: 

Obtain the graph on your calculator or computer. Sketch the regression line below. 



Figure 3.20 



3.15.4 Discussion Questions 

1. Answer each with 1-3 complete sentences. 

a. Does the line seem to fit the data? Why? 

b. What does the correlation imply about the relationship between the number of pages and the cost? 

2. Are there any outliers? If so, which point(s) is an outlier? 

3. Should the outlier, if it exists, be removed? Why or why not? 
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3.16 Lab 3: Regression (Fuel Efficiency) 

Class Time: 
Names: 



17 



3.16.1 Student Learning Outcomes: 



• The student will calculate and construct the line of best fit between two variables. 

• The student will evaluate the relationship between two variables to determine if that relationship is 



significant. 



3.16.2 Collect the Data 

Use the most recent April issue of Consumer Reports. It will give the total fuel efficiency (in miles per 
gallon) and weight (in pounds) of new model cars with automatic transmissions. We will use this data to 
determine the relationship, if any, between the fuel efficiency of a car and its weight. 

1. Which variable should be the independent variable and which should be the dependent variable? 
Explain your answer in one or two complete sentences. 

2. Using your random number generator, randomly select 20 cars from the list and record their weights 
and fuel efficiency into the table below. 



Weight 


Fuel Efficiency 















































































7 This content is available online at <http://cnx.org/content/ml7079/!. 8/>. 
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Table 3.24 

3. Which variable should be the dependent variable and which should be the independent variable? 
Why? 

4. By hand, do a scatterplot of "weight" vs. "fuel efficiency". Plot the points on graph paper. Label both 
axes with words. Scale both axes accurately. 



Figure 3.21 



3.16.3 Analyze the Data 

Enter your data into your calculator or computer. Write the linear equation below, rounding to 4 decimal 
places. 

1. Calculate the following: 

a. a = 

b. b = 

c. correlation = 

d. n = 

A 

e. equation: 3/ = 
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2. Obtain the graph of the regression line on your calculator. Sketch the regression line on the same axes 
as your scatterplot. 



3.16.4 Discussion Questions 

1. Is the correlation significant? Explain how you determined this in complete sentences. 

2. Is the relationship a positive one or a negative one? Explain how you can tell and what this means in 
terms of weight and fuel efficiency. 

3. In one or two complete sentences, what is the practical interpretation of the slope of the least squares 
line in terms of fuel efficiency and weight? 

4. For a car that weighs 4000 pounds, predict its fuel efficiency. Include units. 

5. Can we predict the fuel efficiency of a car that weighs 10000 pounds using the least squares line? 
Explain why or why not. 

6. Questions. Answer each in 1 to 3 complete sentences. 

a. Does the line seem to fit the data? Why or why not? 

b. What does the correlation imply about the relationship between fuel efficiency and weight of a 

car? Is this what you expected? 

7. Are there any outliers? If so, which point is an outlier? 
** This lab was designed and contributed by Diane Mathios. 
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Solutions to Exercises in Chapter 3 



Solution to Example 3.11, Problem 2 (p. 115) 

The x values in the data are between 65 and 75. 78 is outside of the domain of the observed x values in 
the data (independent variable), so you cannot reliably predict the final exam score for this student. (Even 
though it is possible to enter x into the equation and calculate a y value, you should not do so!) 

Solutions to Practice: Linear Regression 

Solution to Exercise 3.12.2 (p. 127) 

a. a = -3,448,225 

b. b = 1750 

c. corr. = 0.4526 

d. n = 22 

Solution to Exercise 3.12.3 (p. 127) 

A 

y= -3,448,225 +1750x 
Solution to Exercise 3.12.4 (p. 127) 

a. 25082 

b. 33,831 

Solution to Exercise 3.12.10 (p. 128) 
a. -1164 

Solutions to Homework 
Solution to Exercise 3.13.1 (p. 129) 

a. Independent: Age; Dependent: Fatalities 

d. Independent: Power Consumption; Dependent: Utility 

Solution to Exercise 3.13.3 (p. 129) 

A 

b. y= 88.7206 - 0.0432x 

c. -0.8533, Yes 
g. No 

h. No. 

i. 2.97, Yes 

j. slope = -0.0432. As the year increases by one, the welfare family size decreases by 0.0432 people. 

Solution to Exercise 3.13.5 (p. 130) 

b. Yes 

A 

c. y= 102.4287 + 11.7585* 

d. 0.9436; yes 

e. 478.70 feet; 1207.73 feet 
g. Yes 

h. Yes; (57, 1050) 
i. 172.98; No 
j. 11.7585 feet 
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k. slope = 11.7585. As the number of stories increases by one, the height of the building increases by 11.7585 
feet. 

Solution to Exercise 3.13.7 (p. 132) 

b. Yes 

A 

c. y= -266.8863 + 0.1656* 

d. 0.9448; Yes 

e. 62.9206; 62.4237 
h. No 

i. 72.639; No 

j. slope = 0.1656. As the year increases by one, the percent of workers paid hourly rates increases by 0.1565. 

Solution to Exercise 3.13.9 (p. 134) 

b. Yes 

A 

c. y= 3.5984 + 0.0371* 

d. 0.9986; Yes 

e. $5.08 

f. $6.93 
i. No 

j. Not valid 

k. slope = 0.0371. As the number of ounces increases by one, the cost of the liquid detergent increases by 
$0.0371 (or about 4 cents). 

Solution to Exercise 3.13.11 (p. 134) 

c. Yes 

A 

d. y= -337,424.6478 + 0.5463* 

e. 0.9964; Yes 

f. $208,872.49; $1,028,318.20 
h. Yes 

i. No 

k. slope = 0.5463. As the net taxable estate increases by one dollar, the approximate probate fees and taxes 
increases by 0.5463 dollars (about 55 cents). 

Solution to Exercise 3.13.13 (p. 135) 

c. Yes 

A 

d. y= 65.0876 + 7.0948* 

e. 0.9761; yes 

f. 72.2 cm; 143.13 cm 
h. Yes 

i. No 

j. 505.0 cm; No 

k. slope = 7.0948. As the age of an American boy increases by one year, the average height increases by 
7.0948 cm. 

Solution to Exercise 3.13.15 (p. 137) 

c. No 

A 

d. y= 47.03 - 0.216* 

e. -0.4280 
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f. 6; 5 

Solution to Exercise 3.13.17 (p. 138) 

A 

d. y= -480.5845 + 0.2748x 

e. 0.9553 

f. 1934 

Solution to Exercise 3.13.19 (p. 139) 

A 

b. y= -569, 770.2796 + 296.0351* 

c. 0.8302 

d. $1577.46 

e. $11,642.66 

f. -$22,105.34 

Solution to Exercise 3.13.21 (p. 140) 

b. r = -0.8, significant 

c. yhat = 48.4-0.00725x 

d. For every one pound increase in weight, the fuel efficiency decreases by 0.00725 miles per gallon. (For 

every one thousand pound increase in weight, the fuel efficiency decreases by 7.25 miles per gallon.) 

e. 64% of the variation in fuel efficiency is explained by the variation in weight using the regression line. 

g. yhat=48.4-0.00725(3000)=26.65 mpg. y-yhat=25-26.65=-1.65. Because yhat=26.5 is greater than y=25, the 

line overestimates the observed fuel efficiency. 
h. (2750,38) is the outlier. Be sure you know how to justify it using the requested graphical or numerical 

methods, not just by guessing. 
i. yhat = 42.4-0.00578x 
j. Without outlier, r=-0.885, rsquare=0.76; with outlier, r=-0.8, rsquare=0.64. The new linear model is a 

better fit, after the outlier is removed from the data, because the new correlation coefficient is farther 

from and the new coefficient of determination is larger. 

Solution to Exercise 3.13.22 (p. 141) 

a. All four data sets have the same correlation coefficient r=0.816 and the same least squares regression line 
yhat=3+0.5x 

b. Set 2 ; c. Set 4 ; d. Set 3 ; e. Set 1 
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' Data Set 2 


10 




8 


** 
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15 
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Solution to Exercise 3.13.23 (p. 142) 
C 

Solution to Exercise 3.13.24 (p. 142) 
A 

Solution to Exercise 3.13.25 (p. 142) 
A 

Solution to Exercise 3.13.26 (p. 143) 
D 

Solution to Exercise 3.13.27 (p. 143) 
A 



Figure 3.22 
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Chapter 4 

Probability Topics 

4.1 Probability Topics 1 

4.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Understand and use the terminology of probability. 

• Determine whether two events are mutually exclusive and whether two events are independent. 

• Calculate probabilities using the Addition Rules and Multiplication Rules. 

• Construct and interpret Contingency Tables. 

• Construct and interpret Venn Diagrams (optional). 

• Construct and interpret Tree Diagrams (optional). 



4.1.2 Introduction 

It is often necessary to "guess" about the outcome of an event in order to make a decision. Politicians study 
polls to guess their likelihood of winning an election. Teachers choose a particular course of study based 
on what they think students can comprehend. Doctors choose the treatments needed for various diseases 
based on their assessment of likely results. You may have visited a casino where people play games chosen 
because of the belief that the likelihood of winning is good. You may have chosen your course of study 
based on the probable availability of jobs. 

You have, more than likely, used probability. In fact, you probably have an intuitive sense of probability. 
Probability deals with the chance of an event occurring. Whenever you weigh the odds of whether or not 
to do your homework or to study for an exam, you are using probability. In this chapter, you will learn to 
solve probability problems using a systematic approach. 

4.1.3 Optional Collaborative Classroom Exercise 

Your instructor will survey your class. Count the number of students in the class today. 

• Raise your hand if you have any change in your pocket or purse. Record the number of raised hands. 

• Raise your hand if you rode a bus within the past month. Record the number of raised hands. 

• Raise your hand if you answered "yes" to BOTH of the first two questions. Record the number of 
raised hands. 



1 This content is available online at <http://cnx.Org/content/ml6838/l.ll/>. 
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Use the class data as estimates of the following probabilities. P(change) means the probability that a ran- 
domly chosen person in your class has change in his/her pocket or purse. P(bus) means the probability that 
a randomly chosen person in your class rode a bus within the last month and so on. Discuss your answers. 

• Find P(change). 

• Find P(bus). 

• Find P(change and bus) Find the probability that a randomly chosen student in your class has change 
in his/her pocket or purse and rode a bus within the last month. 

• Find P(change I bus) Find the probability that a randomly chosen student has change given that 
he/she rode a bus within the last month. Count all the students that rode a bus. From the group 
of students who rode a bus, count those who have change. The probability is equal to those who have 
change and rode a bus divided by those who rode a bus. 



4.2 Terminology 2 

Probability is a measure that is associated with how certain we are of outcomes of a particular experiment 
or activity. An experiment is a planned operation carried out under controlled conditions. If the result is 
not predetermined, then the experiment is said to be a chance experiment. Flipping one fair coin twice is 
an example of an experiment. 

The result of an experiment is called an outcome. A sample space is a set of all possible outcomes. Three 
ways to represent a sample space are to list the possible outcomes, to create a tree diagram, or to create a 
Venn diagram. The uppercase letter S is used to denote the sample space. For example, if you flip one fair 
coin, S = {H, T} where H = heads and T = tails are the outcomes. 

An event is any combination of outcomes. Upper case letters like A and B represent events. For example, 
if the experiment is to flip one fair coin, event A might be getting at most one head. The probability of an 
event A is written P (A). 

The probability of any outcome is the long-term relative frequency of that outcome. Probabilities are 

between and 1, inclusive (includes and 1 and all numbers between these values). P (A) = means 
the event A can never happen. P (A) = 1 means the event A always happens. P (A) = 0.5 means the 
event A is equally likely to occur or not to occur. For example, if you flip one fair coin repeatedly (from 20 
to 2,000 to 20,000 times) the relative fequency of heads approaches 0.5 (the probability of heads). 

Equally likely means that each outcome of an experiment occurs with equal probability. For example, if 
you toss a fair, six-sided die, each face (1, 2, 3, 4, 5, or 6) is as likely to occur as any other face. If you 
toss a fair coin, a Head(H) and a Tail(T) are equally likely to occur. If you randomly guess the answer to a 
true /false question on an exam, you are equally likely to select a correct answer or an incorrect answer. 

To calculate the probability of an event A when all outcomes in the sample space are equally likely, 

count the number of outcomes for event A and divide by the total number of outcomes in the sample space. 
For example, if you toss a fair dime and a fair nickel, the sample space is {HH, TH, HT, TT} where T = 
tails and H = heads. The sample space has four outcomes. A = getting one head. There are two outcomes 
{HT, TH}. P(A) =\. 



Suppose you roll one fair six-sided die, with the numbers {1,2,3,4,5,6} on its faces. Let event E = rolling a 
number that is at least 5. There are two outcomes {5, 6}. P (E) — |. If you were to roll the die only a few 
times, you would not be surprised if your observed results did not match the probability. If you were to 
roll the die a very large number of times, you would expect that, overall, 2/6 of the rolls would result in an 



2 This content is available online at <http://cnx.Org/content/ml6845/l.13/>. 
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outcome of "at least 5". You would not expect exactly 2/6. The long-term relative frequency of obtaining 
this result would approach the theoretical probability of 2/6 as the number of repetitions grows larger and 
larger. 

This important characteristic of probability experiments is the known as the Law of Large Numbers: as 
the number of repetitions of an experiment is increased, the relative frequency obtained in the experiment 
tends to become closer and closer to the theoretical probability. Even though the outcomes don't happen 
according to any set pattern or order, overall, the long-term observed relative frequency will approach the 
theoretical probability. (The word empirical is often used instead of the word observed.) The Law of Large 
Numbers will be discussed again in Chapter 7. 

It is important to realize that in many situations, the outcomes are not equally likely. A coin or die may 
be unfair, or biased . Two math professors in Europe had their statistics students test the Belgian 1 Euro 
coin and discovered that in 250 trials, a head was obtained 56% of the time and a tail was obtained 44% 
of the time. The data seem to show that the coin is not a fair coin; more repetitions would be helpful to 
draw a more accurate conclusion about such bias. Some dice may be biased. Look at the dice in a game you 
have at home; the spots on each face are usually small holes carved out and then painted to make the spots 
visible. Your dice may or may not be biased; it is possible that the outcomes may be affected by the slight 
weight differences due to the different numbers of holes in the faces. Gambling casinos have a lot of money 
depending on outcomes from rolling dice, so casino dice are made differently to eliminate bias. Casino dice 
have flat faces; the holes are completely filled with paint having the same density as the material that the 
dice are made out of so that each face is equally likely to occur. Later in this chapter we will learn techniques 
to use to work with probabilities for events that are not equally likely. 

"OR" Event: 

An outcome is in the event A OR B if the outcome is in A or is in B or is in both A and B. For example, let 
A = {1, 2, 3, 4, 5} and B = {4, 5, 6, 7, 8}. A ORB = {1, 2, 3, 4, 5, 6, 7, 8}. Notice that 4 and 5 are 
NOT listed twice. 

"AND" Event: 

An outcome is in the event A AND B if the outcome is in both A and B at the same time. For example, let 
A and B be {1, 2, 3, 4, 5} and {4, 5, 6, 7, 8}, respectively. Then A AND B = {4,5}. 

The complement of event A is denoted A' (read "A prime"). A' consists of all outcomes that are NOT in A. 
Notice that P (A) + P (A') = 1. For example, let S = {1, 2, 3, 4, 5, 6} and let A = {1, 2, 3, 4}. Then, 
A' = {5, 6}. P (A) =\,P (A') = §, and P (A) + P (A') = £ + g= 1 

The conditional probability of A given B is written P (A\B). P (A\B) is the probability that event A will 
occur given that the event B has already occurred. A conditional reduces the sample space. We calculate 
the probability of A from the reduced sample space B. The formula to calculate P (A\B) is 

where P (B) is greater than 0. 

For example, suppose we toss one fair, six-sided die. The sample space S = {1, 2, 3, 4, 5, 6}. Let A = 
face is 2 or 3 and B = face is even (2, 4, 6). To calculate P (A\B), we count the number of outcomes 2 or 3 in 
the sample space B = {2, 4, 6}. Then we divide that by the number of outcomes in B (and not S). 

We get the same result by using the formula. Remember that S has 6 outcomes. 

p ( a\t>\ P(A and B) (the number of outcomes that are 2 or 3 and even in S) / 6 1/6 1 

v I / P(B) — (the number of outcomes that are even in S) / 6 — 3/6 3 



160 CHAPTER 4. PROBABILITY TOPICS 

Understanding Terminology and Symbols 

It is important to read each problem carefully to think about and understand what the events are. Under- 
standing the wording is the first very important step in solving probability problems. Reread the problem 
several times if necessary. Clearly identify the event of interest. Determine whether there is a condition 
stated in the wording that would indicate that the probability is conditional; carefully identify the condi- 
tion, if any. 

Exercise 4.2.1 (Solution on p. 196.) 

In a particular college class, there are male and female students. Some students have long hair and 
some students have short hair. Write the symbols for the probabilities of the events for parts (a) 
through (j) below. (Note that you can't find numerical answers here. You were not given enough 
information to find any probability values yet; concentrate on understanding the symbols.) 

• Let F be the event that a student is female. 

• Let M be the event that a student is male. 

• Let S be the event that a student has short hair. 

• Let L be the event that a student has long hair. 

a. The probability that a student does not have long hair. 

b. The probability that a student is male or has short hair. 

c. The probability that a student is a female and has long hair. 

d. The probability that a student is male, given that the student has long hair. 

e. The probability that a student has long hair, given that the student is male. 

f . Of all the female students, the probability that a student has short hair. 

g. Of all students with long hair, the probability that a student is female. 
h. The probability that a student is female or has long hair. 

i. The probability that a randomly selected student is a male student with short hair. 
j. The probability that a student is female. 

**With contributions from Roberta Bloom 

4.3 Independent and Mutually Exclusive Events 3 

Independent and mutually exclusive do not mean the same thing. 

4.3.1 Independent Events 

Two events are independent if the following are true: 

• P(A\B) = P(A) 

• P(B\A) = P(B) 

• P(AANDB) = P(A) ■ P(B) 

Two events A and B are independent if the knowledge that one occurred does not affect the chance the 
other occurs. For example, the outcomes of two roles of a fair die are independent events. The outcome 
of the first roll does not change the probability for the outcome of the second roll. To show two events are 
independent, you must show only one of the above conditions. If two events are NOT independent, then 
we say that they are dependent. 

Sampling may be done with replacement or without replacement. 



3 This content is available online at <http://cnx.Org/content/ml6837/l.14/>. 
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• With replacement: If each member of a population is replaced after it is picked, then that member has 
the possibility of being chosen more than once. When sampling is done with replacement, then events 
are considered to be independent, meaning the result of the first pick will not change the probabilities 
for the second pick. 

• Without replacement:: When sampling is done without replacement, then each member of a popu- 
lation may be chosen only once. In this case, the probabilities for the second pick are affected by the 
result of the first pick. The events are considered to be dependent or not independent. 

If it is not known whether A and B are independent or dependent, assume they are dependent until you 
can show otherwise. 

4.3.2 Mutually Exclusive Events 

A and B are mutually exclusive events if they cannot occur at the same time. This means that A and B do 
not share any outcomes and P(A AND B) = 0. 

For example, suppose the sample space S — {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Let 

A = {1, 2, 3, 4, 5}, B = {4, 5, 6, 7, 8}, and C = {7, 9}. A AND B = {4,5}. P(A AND B) = 
^and is not equal to zero. Therefore, A and B are not mutually exclusive. A and C do not have any 
numbers in common so P(A AND C) = 0. Therefore, A and C are mutually exclusive. 



If it is not known whether A and B are mutually exclusive, assume they are not until you can show other- 
wise. 

The following examples illustrate these definitions and terms. 

Example 4.1 

Flip two fair coins. (This is an experiment.) 

The sample space is {HH, HT, TH, TT} where T = tails and H = heads. The outcomes are HH, 
HT, TH, and TT. The outcomes HT and TH are different. The HT means that the first coin 
showed heads and the second coin showed tails. The TH means that the first coin showed tails 
and the second coin showed heads. 

• Let A = the event of getting at most one tail. (At most one tail means or 1 tail.) Then A can 
be written as {HH, HT, TH}. The outcome HH shows tails. HT and TH each show 1 tail. 

• Let B = the event of getting all tails. B can be written as {TT}. B is the complement of A. So, 
B = A'. Also, P (A) + P(B) = P(A) + P (A') = 1. 

• The probabilities for A and for B are P (A) — | and P (B) = \. 

• Let C = the event of getting all heads. C = {HH}. Since B = {TT}, P (B AND C) = 0. 
B and C are mutually exclusive. (B and C have no members in common because you cannot 
have all tails and all heads at the same time.) 

• Let D = event of getting more than one tail. D = {TT}. P (D) = \. 

• Let E = event of getting a head on the first roll. (This implies you can get either a head or tail 
on the second roll.) E = {HT,HH}. P (E) = §. 

• Find the probability of getting at least one (1 or 2) tail in two flips. Let F = event of getting 
at least one tail in two flips. F = {HT, TH, TT}. P(F) = | 

Example 4.2 

Roll one fair 6-sided die. The sample space is {1, 2, 3, 4, 5, 6}. Let event A = a face is odd. Then 
A = {1, 3, 5}. Let event B = a face is even. Then B = {2, 4, 6}. 

• Find the complement of A, A'. The complement of A, A', is B because A and B together 
make up the sample space. P(A) + P(B) = P(A) + P(A') = 1. Also, P(A) = | and P(B) = | 
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• Let event C = odd faces larger than 2. Then C = {3,5}. Let event D = all even faces smaller 
than 5. Then D = {2, 4}. P(C and D) = because you cannot have an odd and even face at 
the same time. Therefore, C and D are mutually exclusive events. 

• Let event E = all faces less than 5 . E ={1,2,3,4}. 

Problem (Solution on p. 196.) 

Are C and E mutually exclusive events? (Answer yes or no.) Why or why not? 

• Find P(C I A). This is a conditional. Recall that the event C is {3, 5} and event A is {1, 3, 5}. 

To find P(C I A), find the probability of C using the sample space A. You have reduced the 

2 
3 



sample space from the original sample space {1, 2, 3, 4, 5, 6} to {1, 3, 5}. So, P(C I A) - ' 



Example 4.3 

Let event G = taking a math class. Let event H = taking a science class. Then, G AND H = taking 
a math class and a science class. Suppose P(G) = 0.6, P(H) = 0.5, and P(G AND H) = 0.3. Are 
G and H independent? 

If G and H are independent, then you must show ONE of the following: 

• P(GIH) = P(G) 

• P(HIG) = P(H) 

• P(G AND H) = P(G) • P(H) 

NOTE: The choice you make depends on the information you have. You could choose any of the 
methods here because you have the necessary information. 

Problem 1 

Show that P(G I H) = P(G). 



Solution 

P(GIH) = 


P(G AND H) 
P(H) 


0.3 . 
0.5 


0.6 = P(G) 


Problem 2 

Show P(G AND H) = 


= P(G) 


• P(H). 




Solution 

P(G) -P(H 


r ) = 0.6 


0.5 = 


= 0.3 = 


= P(GANDH) 



Since G and H are independent, then, knowing that a person is taking a science class does not 
change the chance that he/she is taking math. If the two events had not been independent (that 
is, they are dependent) then knowing that a person is taking a science class would change the 
chance he/she is taking math. For practice, show that P(H I G) — P(H) to show that G and H are 
independent events. 

Example 4.4 

In a box there are 3 red cards and 5 blue cards. The red cards are marked with the numbers 1, 2, 
and 3, and the blue cards are marked with the numbers 1, 2, 3, 4, and 5. The cards are well-shuffled. 
You reach into the box (you cannot see into it) and draw one card. 

Let R = red card is drawn, B = blue card is drawn, E = even-numbered card is drawn. 

The sample space S = Rl, R2, R3, Bl, B2, B3, B4, B5. S has 8 outcomes. 

• P(R) = |. P(B) = |. P(R AND B) = 0. (You cannot draw one card that is both red and blue.) 
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• P(E) = |. (There are 3 even-numbered cards, R2, B2, and B4.) 

• P(E IB)— |. (There are 5 blue cards: Bl, B2, B3, B4, and B5. Out of the blue cards, there are 
2 even cards: B2 and B4.) 

• P(B I E) — I. (There are 3 even-numbered cards: R2, Bl, and B4. Out of the even-numbered 
cards, 2 are blue: B2 and B4.) 

• The events R and B are mutually exclusive because P(R AND B) = 0. 

• Let G = card with a number greater than 3. G = {B4, B5}. P(G) = |. Let H = blue card 
numbered between 1 and 4, inclusive. H = {Bl, B2, B3, B4}. P(G I H) = \. (The only card in 
H that has a number greater than 3 is B4.) Since | — \, P(G) = P(G I H) which means that 
G and H are independent. 

Example 4.5 

In a particular college class, 60% of the students are female. 50 % of all students in the class have 
long hair. 45% of the students are female and have long hair. Of the female students, 75% have 
long hair. Let F be the event that the student is female. Let L be the event that the student has 
long hair. One student is picked randomly. Are the events of being female and having long hair 
independent? 

• The following probabilities are given in this example: 

• P(F) =0.60;P(L) =0.50 

• P(FANDL) =0.45 

• P(L I F) = 0.75 

NOTE: The choice you make depends on the information you have. You could use the first or 
last condition on the list for this example. You do not know P(F I L) yet, so you can not use the 
second condition. 

Solution 1 

Check whether P(F and L) = P(F)P(L): We are given that P(F and L) = 0.45 ; but P(F)P(L) = 
(0.60)(0.50)= 0.30 The events of being female and having long hair are not independent because 
P(F and L) does not equal P(F)P(L). 

Solution 2 

check whether P(L I F) equals P(L): We are given that P(L I F) = 0.75 but P(L) = 0.50; they are not 
equal. The events of being female and having long hair are not independent. 

Interpretation of Results 

The events of being female and having long hair are not independent; knowing that a student is 
female changes the probability that a student has long hair. 

**Example 5 contributed by Roberta Bloom 

4.4 Two Basic Rules of Probability 4 

4.4.1 The Multiplication Rule 

If A and B are two events defined on a sample space, then: P(A AND B) = P(B) • P(A I B). 

This rule may also be written as : P (A\B) = — - — -pj-^ — - 



4 This content is available online at <http://cnx.Org/content/ml6847/l.ll/>. 
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(The probability of A given B equals the probability of A and B divided by the probability of B.) 

If A and B are independent, then P(A I B) = P(A). Then P(A AND B) = P(A I B) P(B) becomes 
P(A AND B) = P(A) P(B). 

4.4.2 The Addition Rule 

If A and B are defined on a sample space, then: P(A OR B) = P(A) + P(B) - P(A AND B). 

If A and B are mutually exclusive, then P(A AND B) = 0. Then P(A OR B) = P(A) + P(B) - P(A AND B) 
becomes P(A OR B) = P(A) + P(B). 

Example 4.6 

Klaus is trying to choose where to go on vacation. His two choices are: A = New Zealand and B 

= Alaska 

• Klaus can only afford one vacation. The probability that he chooses A is P(A) = 0.6 and the 
probability that he chooses B is P(B) = 0.35. 

• P(A and B) = because Klaus can only afford to take one vacation 

• Therefore, the probability that he chooses either New Zealand or Alaska is P(A OR B) = 
P(A) + P(B) = 0.6 + 0.35 = 0.95. Note that the probability that he does not choose to go 
anywhere on vacation must be 0.05. 

Example 4.7 

Carlos plays college soccer. He makes a goal 65% of the time he shoots. Carlos is going to attempt 

two goals in a row in the next game. 

A = the event Carlos is successful on his first attempt. P(A) = 0.65. B = the event Carlos is 
successful on his second attempt. P(B) = 0.65. Carlos tends to shoot in streaks. The probability 
that he makes the second goal GIVEN that he made the first goal is 0.90. 

Problem 1 

What is the probability that he makes both goals? 

Solution 

The problem is asking you to find P(A AND B) = P(B AND A). Since P(B I A) = 0.90: 

P(B AND A) = P(B I A) P(A) = 0.90 * 0.65 = 0.585 (4.1) 

Carlos makes the first and second goals with probability 0.585. 

Problem 2 

What is the probability that Carlos makes either the first goal or the second goal? 

Solution 

The problem is asking you to find P(A OR B). 

P(A OR B) = P(A) + P(B) - P(A AND B) = 0.65 + 0.65 - 0.585 = 0.715 (4.2) 

Carlos makes either the first goal or the second goal with probability 0.715. 

Problem 3 

Are A and B independent? 
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Solution 

No, they are not, because P(B AND A) = 0.585. 

P(B) • P(A) = (0.65) • (0.65) = 0.423 (4.3) 

0.423 / 0.585 = P(BANDA) (4.4) 

So, P(B AND A) is not equal to P(B) • P(A). 

Problem 4 

Are A and B mutually exclusive? 

Solution 

No, they are not because P(A and B) = 0.585. 

To be mutually exclusive, P(A AND B) must equal 0. 



Example 4.8 

A community swim team has 150 members. Seventy-five of the members are advanced swim- 
mers. Forty-seven of the members are intermediate swimmers. The remainder are novice swim- 
mers. Forty of the advanced swimmers practice 4 times a week. Thirty of the intermediate swim- 
mers practice 4 times a week. Ten of the novice swimmers practice 4 times a week. Suppose one 
member of the swim team is randomly chosen. Answer the questions (Verify the answers): 

Problem 1 

What is the probability that the member is a novice swimmer? 

Solution 

28 

150 

Problem 2 

What is the probability that the member practices 4 times a week? 

Solution 

80 

150 

Problem 3 

What is the probability that the member is an advanced swimmer and practices 4 times a week? 

Solution 

40 

150 



Problem 4 

What is the probability that a member is an advanced swimmer and an intermediate swimmer? 
Are being an advanced swimmer and an intermediate swimmer mutually exclusive? Why or why 
not? 

Solution 

P(advanced AND intermediate) = 0, so these are mutually exclusive events. A swimmer cannot 
be an advanced swimmer and an intermediate swimmer at the same time. 
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Problem 5 

Are being a novice swimmer and practicing 4 times a week independent events? Why or why 
not? 



Solution 

No, these are not independent events. 



P(novice AND practices 4 times per week) = 0.0667 (4.5) 

P(novice) • P(practices 4 times per week) = 0.0996 (4.6) 

0.0667 / 0.0996 (4.7) 



Example 4.9 

Studies show that, if she lives to be 90, about 1 woman in 7 (approximately 14.3%) will develop 
breast cancer. Suppose that of those women who develop breast cancer, a test is negative 2% of the 
time. Also suppose that in the general population of women, the test for breast cancer is negative 
about 85% of the time. Let B = woman develops breast cancer and let N = tests negative. Suppose 
one woman is selected at random. 

Problem 1 

What is the probability that the woman develops breast cancer? What is the probability that 
woman tests negative? 

Solution 

P(B) = 0.143 ; P(N) = 0.85 

Problem 2 

Given that the woman has breast cancer, what is the probability that she tests negative? 

Solution 

P(N I B) = 0.02 

Problem 3 

What is the probability that the woman has breast cancer AND tests negative? 

Solution 

P(B AND N) = P(B) • P(N I B) = (0.143) • (0.02) = 0.0029 

Problem 4 

What is the probability that the woman has breast cancer or tests negative? 

Solution 

P(B OR N) = P(B) + P(N) - P(B AND N) = 0.143 + 0.85 - 0.0029 = 0.9901 
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Problem 5 

Are having breast cancer and testing negative independent events? 

Solution 

No. P(N) = 0.85; P(N I B) = 0.02. So, P(N I B) does not equal P(N) 

Problem 6 

Are having breast cancer and testing negative mutually exclusive? 

Solution 

No. P(B AND N) = 0.0029. For B and N to be mutually exclusive, P(B AND N) must be 0. 



4.5 Contingency Tables 5 

A contingency table provides a way of portraying data that can facilitate calculating probabilities. The table 
helps in determining conditional probabilities quite easily. The table displays sample values in relation 
to two different variables that may be dependent or contingent on one another. Later on, we will use 
contingency tables again, but in another manner. Contingincy tables provide a way of portraying data that 
can facilitate calculating probabilities. 

Example 4.10 

Suppose a study of speeding violations and drivers who use car phones produced the following 
fictional data: 





Speeding violation in 
the last year 


No speeding violation 
in the last year 


Total 


Car phone user 


25 


280 


305 


Not a car phone user 


45 


405 


450 


Total 


70 


685 


755 



Table 4.1 

The total number of people in the sample is 755. The row totals are 305 and 450. The column totals 
are 70 and 685. Notice that 305 + 450 = 755 and 70 + 685 = 755. 

Calculate the following probabilities using the table 

Problem 1 

P(person is a car phone user) = 

Solution 

number of car phone users 305 

total number in study 755 



Problem 2 

P(person had no violation in the last year) - 

5 This content is available online at <http://cnx.Org/content/ml6835/l.12/>. 
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Solution 

number that had no violation 685 

total number in study — 755 



Problem 3 

P(person had no violation in the last year AND was a car phone user) 

Solution 

280 

755 



Problem 4 

P(person is a car phone user OR person had no violation in the last year) 

Solution 

<- 305 , 685\ 280 _ 710 
V 755 "•" 755 ) 755 ~~ 755 



Problem 5 

P(person is a car phone user GIVEN person had a violation in the last year) = 

Solution 

7Q (The sample space is reduced to the number of persons who had a violation.) 



Problem 6 

P(person had no violation last year GIVEN person was not a car phone user) = 

Solution 

|j=jj (The sample space is reduced to the number of persons who were not car phone users.) 



Example 4.11 

The following table shows a random sample of 100 hikers and the areas of hiking preferred: 

Hiking Area Preference 



Sex 


The Coastline 


Near Lakes and Streams 


On Mountain Peaks 


Total 


Female 


18 


16 





45 


Male 








14 


55 


Total 





41 









Table 4.2 



(Solution on p. 196.) 



Problem 1 

Complete the table. 

Problem 2 (Solution on p. 196.) 

Are the events "being female" and "preferring the coastline" independent events? 

Let F = being female and let C = preferring the coastline. 
a. P(F AND C) = 
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b. P(F) ■ P(C) = 

Are these two numbers the same? If they are, then F and C are independent. If they are not, then 
F and C are not independent. 

Problem 3 (Solution on p. 196.) 

Find the probability that a person is male given that the person prefers hiking near lakes and 
streams. Let M = being male and let L = prefers hiking near lakes and streams. 

a. What word tells you this is a conditional? 

b. Fill in the blanks and calculate the probability: P( I ) — . 

c. Is the sample space for this problem all 100 hikers? If not, what is it? 

Problem 4 (Solution on p. 196.) 

Find the probability that a person is female or prefers hiking on mountain peaks. Let F = being 
female and let P = prefers mountain peaks. 

a. P(F) = 

b. P(P) = 

c. P(FANDP) = 

d. Therefore, P(F OR P) = 



Example 4.12 

Muddy Mouse lives in a cage with 3 doors. If Muddy goes out the first door, the probability that 
he gets caught by Alissa the cat is l and the probability he is not caught is | . If he goes out the 
second door, the probability he gets caught by Alissa is | and the probability he is not caught is | . 
The probability that Alissa catches Muddy coming out of the third door is I and the probability 
she does not catch Muddy is i . It is equally likely that Muddy will choose any of the three doors 
so the probability of choosing each door is i . 

Door Choice 



Caught or Not 


Door One 


Door Two 


Door Three 


Total 


Caught 


l 
15 


l 

12 


l 

6 




Not Caught 


4 
15 


3 
12 


1 
6 




Total 








1 



Table 4.3 



The first entry ^ = U W | J is P(Door One AND Caught). 
The entry ^ = (|) (|) is P(Door One AND Not Caught). 



Verify the remaining entries. 

Problem 1 (Solution on p. 196.) 

Complete the probability contingency table. Calculate the entries for the totals. Verify that the 
lower-right corner entry is 1. 

Problem 2 

What is the probability that Alissa does not catch Muddy? 
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Solution 

41 

60 



Problem 3 

What is the probability that Muddy chooses Door One OR Door Two given that Muddy is caught 
by Alissa? 



Solution 

9_ 

19 



NOTE: You could also do this problem by using a probability tree. See the Tree Diagrams (Op- 
tional) (Section 4.7) section of this chapter for examples. 



4.6 Venn Diagrams (optional) 6 

A Venn diagram is a picture that represents the outcomes of an experiment. It generally consists of a box 
that represents the sample space S together with circles or ovals. The circles or ovals represent events. 

Example 4.13 

Suppose an experiment has the outcomes 1, 2, 3, ... , 12 where each outcome has an equal chance 
of occurring. Let event A = {1,2, 3,4, 5, 6} and event B = {6, 7, 8, 9}. Then A AND B = {6} 
and A OR B = {1, 2, 3, 4, 5, 6, 7, 8, 9}. The Venn diagram is as follows: 





A 




ID 






1) 


11 


fT 


2 3 y 

4 5 C 


V 7 B 


— ** 12 



Example 4.14 

Flip 2 fair coins. Let A = tails on the first coin. Let B = tails on the second coin. Then A — {TT, TH} 
and B = {TT,HT}. Therefore, A AND B = {TT}. A ORB = {TH,TT,HT}. 

The sample space when you flip two fair coins is S = {HH, HT, TH, TT}. The outcome HH is in 
neither A nor B. The Venn diagram is as follows: 



6 This content is available online at <http://cnx.Org/content/ml6848/l.12/>. 
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s 



A ^ 


yC 


^s 


H 




t 


§ 


HT 


) 


HH 



Example 4.15 

Forty percent of the students at a local college belong to a club and 50% work part time. Five 
percent of the students work part time and belong to a club. Draw a Venn diagram showing the 
relationships. Let C = student belongs to a club and PT = student works part time. 




If a student is selected at random find 

• The probability that the student belongs to a club. P(C) = 0.40. 

• The probability that the student works part time. P(PT) = 0.50. 

• The probability that the student belongs to a club AND works part time. P(C AND PT) 
0.05. 

• The probability that the student belongs to a club given that the student works part time. 



P(C I PT) 



P(C AND PT) 



0.05 
O50 



0.1 



P(PT) 
The probability that the student belongs to a club OR works part time. 

P(CORPT) = P(C) + P(PT) - P(CANDPT) = 0.40 + 0.50 - 0.05 = 0.85 



(4.8) 



(4.9) 



4.7 Tree Diagrams (optional) 7 

A tree diagram is a special type of graph used to determine the outcomes of an experiment. It consists of 
"branches" that are labeled with either frequencies or probabilities. Tree diagrams can make some probabil- 
ity problems easier to visualize and solve. The following example illustrates how to use a tree diagram. 

7 This content is available online at <http://cnx.Org/content/ml6846/l.10/>. 
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Example 4.16 

In an urn, there are 11 balls. Three balls are red (R) and 8 balls are blue (B). Draw two balls, one 
at a time, with replacement. "With replacement" means that you put the first ball back in the urn 
before you select the second ball. The tree diagram using frequencies that show all the possible 
outcomes follows. 




I s1 Draw 



2* Draw 



64BB 



24BR 24RB 



9RR 



Figure 4.1: Total = 64 + 24 + 24 + 9 = 121 



The first set of branches represents the first draw. The second set of branches represents the second 
draw. Each of the outcomes is distinct. In fact, we can list each red ball as Rl, R2, and R3 and each 
blue ball as Bl, B2, B3, B4, B5, B6, B7, and B8. Then the 9 RR outcomes can be written as: 

R1R1; R1R2; R1R3; R2R1; R2R2; R2R3; R3R1; R3R2; R3R3 

The other outcomes are similar. 

There are a total of 11 balls in the urn. Draw two balls, one at a time, and with replacement. There 
are 11 • 11 = 121 outcomes, the size of the sample space. 

Problem 1 (Solution on p. 197.) 

List the 24 BR outcomes: B1R1, B1R2, B1R3, ... 

Problem 2 

Using the tree diagram, calculate P(RR). 

Solution 



P(RR) = 3- • 


3 
11 


= 


9 
121 








Problem 3 














Using the tree diagram, 


calculate P(RB OR BR). 


Solution 














P(RB OR BR) 


= 


3 
11 


8 
' 11 


+ -*■ 

-r n 


3 
' 11 ' 


_ 48 
121 
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Problem 4 

Using the tree diagram, calculate P(R on 1st draw AND B on 2nd draw). 

Solution 

P(R on 1st draw AND B on 2nd draw) = P(RB) 



11 ' 11 



24 
121 



Problem 5 

Using the tree diagram, calculate P(R on 2nd draw given B on 1st draw). 

Solution 

P(R on 2nd draw given B on 1st draw) = P(R on 2nd I B on 1st) 



24 



11 



This problem is a conditional. The sample space has been reduced to those outcomes that already 
have a blue on the first draw. There are 24 + 64 = 88 possible outcomes (24 BR and 64 BB). 
Twenty -four of the 88 possible outcomes are BR. || = ^-. 



Problem 6 (Solution on p. 197.) 

Using the tree diagram, calculate P(BB). 

Problem 7 (Solution on p. 197.) 

Using the tree diagram, calculate P(B on the 2nd draw given R on the first draw). 

Example 4.17 

An urn has 3 red marbles and 8 blue marbles in it. Draw two marbles, one at a time, this time 
without replacement from the urn. "Without replacement" means that you do not put the first 
ball back before you select the second ball. Below is a tree diagram. The branches are labeled with 
probabilities instead of frequencies. The numbers at the ends of the branches are calculated by 
multiplying the numbers on the two corresponding branches, for example, fy ' TO ~ ITO • 



56 
110 

BB 



24 24 



110 

BR 



110 
RB 



B 


B 
8 
ll/ 


R B 


R 

3 

\11 


1" Draw 


7 
10/ 




3 8 
\10 10/ 




2 

^v 2 a Diaw 



















110 

RR 



Figure 4.2: Total 



56 + 24 + 24 - 
110 



110 
110 
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NOTE: If you draw a red on the first draw from the 3 red possibilities, there are 2 red left to draw 
on the second draw. You do not put back or replace the first ball after you have drawn it. You 
draw without replacement, so that on the second draw there are 10 marbles left in the urn. 

Calculate the following probabilities using the tree diagram. 

Problem 1 

P(RR) = 



Solution 

P(RR) = £ • *, 



6 
110 



Problem 2 

Fill in the blanks: 



3_ 
11 



10 



P(RB OR BR) 

Problem 3 

P(R on 2d I B on 1st) = 

Problem 4 

Fill in the blanks: 



+ (_)(_) 



no 



(Solution on p. 197.) 



(Solution on p. 197.) 



(Solution on p. 197.) 



P(RB) = (_)(_) 



24 
110 



(Solution on p. 197.) 



P(R on 1st and B on 2nd) 

Problem 5 

P(BB) = 

Problem 6 

P(B on 2nd I R on 1st) = 

Solution 

There are 6 + 24 outcomes that have R on the first draw (6 RR and 24 RB). The 6 and the 24 
are frequencies. They are also the numerators of the fractions jjq and j^. The sample space is no 
longer 110 but 6 + 24 = 30. Twenty-four of the 30 outcomes have B on the second draw. The 
probability is then |^. Did you get this answer? 



If we are using probabilities, we can label the tree in the following general way. 



P(B|B) 




P(R|R) 



P(E and B)= ?(BB) 



PCHardB) = P(HB3 P(R. and E) = P(RR) 



PCBandlt)=P<BI9 
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• P(R I R) here means P(R on 2nd I R on 1st) 

• P(B I R) here means P(B on 2nd I R on 1st) 

• P(R I B) here means P(R on 2nd I B on 1st) 

• P(B I B) here means P(B on 2nd I B on 1st) 
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4.8 Summary of Formulas 8 



Formula 4.1: Complement 

If A and A' are complements then P (A) + P(A' ) = 1 

Formula 4.2: Addition Rule 

P(A OR B) = P(A) + P(B) - P(A AND B) 

Formula 4.3: Mutually Exclusive 

If A and B are mutually exclusive then P(A AND B) = ; so P(A OR B) = P(A) + P(B). 

Formula 4.4: Multiplication Rule 

• P(A AND B) = P(B)P(A I B) 

• P(A AND B) = P(A)P(B I A) 

Formula 4.5: Independence 

If A and B are independent then: 

• P(A I B) = P(A) 

• P(B I A) = P(B) 

• P(A AND B) = P(A)P(B) 



8 This content is available online at <http://cnx.org/content/ml6843/1.5/>. 
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4.9 Practice 1: Contingency Tables 9 

4.9.1 Student Learning Outcomes 

• The student will construct and interpret contingency tables. 

4.9.2 Given 

An article in the New England Journal of Medicine , reported about a study of smokers in California and 
Hawaii. In one part of the report, the self-reported ethnicity and smoking levels per day were given. Of the 
people smoking at most 10 cigarettes per day, there were 9886 African Americans, 2745 Native Hawaiians, 
12,831 Latinos, 8378 Japanese Americans, and 7650 Whites. Of the people smoking 11-20 cigarettes per 
day, there were 6514 African Americans, 3062 Native Hawaiians, 4932 Latinos, 10,680 Japanese Americans, 
and 9877 Whites. Of the people smoking 21-30 cigarettes per day, there were 1671 African Americans, 1419 
Native Hawaiians, 1406 Latinos, 4715 Japanese Americans, and 6062 Whites. Of the people smoking at least 
31 cigarettes per day, there were 759 African Americans, 788 Native Hawaiians, 800 Latinos, 2305 Japanese 
Americans, and 3970 Whites. {(Source: http://www.nejm.org/doi/full/10.1056/NEJMoa033250)) 

4.9.3 Complete the Table 

Complete the table below using the data provided. 

Smoking Levels by Ethnicity 



Smoking 
Level 


African 
American 


Native 
Hawaiian 


Latino 


Japanese 
Americans 


White 


TOTALS 


1-10 














11-20 














21-30 














31+ 














TOTALS 















Table 4.4 



4.9.4 Analyze the Data 

Suppose that one person from the study is randomly selected. 

Exercise 4.9.1 

Find the probability that person smoked 11-20 cigarettes per day. 

Exercise 4.9.2 

Find the probability that person was Latino. 



(Solution on p. 197.) 



(Solution on p. 197.) 



9 This content is available online at <http://cnx.Org/content/ml6839/l.ll/>. 
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4.9.5 Discussion Questions 

Exercise 4.9.3 (Solution on p. 197.) 

In words, explain what it means to pick one person from the study and that person is "Japanese 
American AND smokes 21-30 cigarettes per day." Also, find the probability. 

Exercise 4.9.4 (Solution on p. 197.) 

In words, explain what it means to pick one person from the study and that person is "Japanese 
American OR smokes 21-30 cigarettes per day." Also, find the probability. 

Exercise 4.9.5 (Solution on p. 197.) 

In words, explain what it means to pick one person from the study and that person is "Japanese 
American GIVEN that person smokes 21-30 cigarettes per day." Also, find the probability. 

Exercise 4.9.6 

Prove that smoking level /day and ethnicity are dependent events. 
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4.10 Practice 2: Calculating Probabilities 10 

4.10.1 Student Learning Outcomes 

• Students will define basic probability terms. 

• Students will calculate probabilities. 

• Students will determine whether two events are mutually exclusive or whether two events are inde- 
pendent. 

NOTE: Use probability rules to solve the problems below. Show your work. 

4.10.2 Given 

48% of all Californians registered voters prefer life in prison without parole over the death penalty for 
a person convicted of first degree murder. Among Latino California registered voters, 55% prefer life 
in prison without parole over the death penalty for a person convicted of first degree murder. (Source: 
http://field.com/fieldpollonline/subscribers/Rls2393.pdf ). 
37.6% of all Californians are Latino (Source: U.S. Census Bureau). 

In this problem, let: 

• C — Californians (registered voters) preferring life in prison without parole over the death penalty for a person con 

• L = Latino Californians 

Suppose that one Californian is randomly selected. 

4.10.3 Analyze the Data 

Exercise 4.10.1 (Solution on p. 197.) 

P(C) = 
Exercise 4.10.2 (Solution on p. 197.) 

P(L) = 

Exercise 4.10.3 (Solution on p. 197.) 

P(C\L) = 

Exercise 4.10.4 

In words, what is " C | L"? 

Exercise 4.10.5 (Solution on p. 197.) 

P (L AND C) = 

Exercise 4.10.6 

In words, what is "L and C"? 

Exercise 4.10.7 (Solution on p. 198.) 

Are L and C independent events? Show why or why not. 

Exercise 4.10.8 (Solution on p. 198.) 

P (L OR C) = 

Exercise 4.10.9 

In words, what is "L or C"? 

Exercise 4.10.10 (Solution on p. 198.) 

Are L and C mutually exclusive events? Show why or why not. 



"This content is available online at <http://cnx.Org/content/ml6840/l.12/>. 
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4.11 Homework 11 

Exercise 4.11.1 (Solution on p. 198.) 

Suppose that you have 8 cards. 5 are green and 3 are yellow. The 5 green cards are numbered 
1, 2, 3, 4, and 5. The 3 yellow cards are numbered 1, 2, and 3. The cards are well shuffled. You 
randomly draw one card. 

• G = card drawn is green 

• E = card drawn is even-numbered 

a. List the sample space. 
b.P(G) = 

c. P(G\E) = 

d. P(G ANDE) = 

e. P(GORE) = 

f. Are G and E mutually exclusive? Justify your answer numerically. 

Exercise 4.11.2 

Refer to the previous problem. Suppose that this time you randomly draw two cards, one at a 
time, and with replacement. 

• Gi = first card is green 

• Gi = second card is green 

a. Draw a tree diagram of the situation. 

b. P (Gi AND G 2 ) = 

c. P (at least one green) = 

d. P(G 2 \G 1 ) = 

e. Are G 2 and Gi independent events? Explain why or why not. 

Exercise 4.11.3 (Solution on p. 198.) 

Refer to the previous problems. Suppose that this time you randomly draw two cards, one at a 
time, and without replacement. 

• Gi= first card is green 

• G 2 = second card is green 

a. Draw a tree diagram of the situation. 

b. P (Gj AND G 2 ) = 

c. P(at least one green) = 

d. P(G 2 |G t ) = 

e. Are G 2 and G\ independent events? Explain why or why not. 

Exercise 4.11.4 

Roll two fair dice. Each die has 6 faces. 

a. List the sample space. 

b. Let A be the event that either a 3 or 4 is rolled first, followed by an even number. Find P (A). 

c. Let B be the event that the sum of the two rolls is at most 7. Find P (B). 



^his content is available online at <http://cnx.Org/content/ml6836/l.21/>. 
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d. In words, explain what "P (A\B)" represents. Find P (A\B). 

e. Are A and B mutually exclusive events? Explain your answer in 1 - 3 complete sentences, 

including numerical justification. 

f. Are A and B independent events? Explain your answer in 1 - 3 complete sentences, including 

numerical justification. 

Exercise 4.11.5 (Solution on p. 198.) 

A special deck of cards has 10 cards. Four are green, three are blue, and three are red. When a 
card is picked, the color of it is recorded. An experiment consists of first picking a card and then 
tossing a coin. 

a. List the sample space. 

b. Let A be the event that a blue card is picked first, followed by landing a head on the coin toss. 

FindP(A). 

c. Let B be the event that a red or green is picked, followed by landing a head on the coin toss. Are 

the events A and B mutually exclusive? Explain your answer in 1 - 3 complete sentences, 
including numerical justification. 

d. Let C be the event that a red or blue is picked, followed by landing a head on the coin toss. Are 

the events A and C mutually exclusive? Explain your answer in 1 - 3 complete sentences, 
including numerical justification. 

Exercise 4.11.6 

An experiment consists of first rolling a die and then tossing a coin: 

a. List the sample space. 

b. Let A be the event that either a 3 or 4 is rolled first, followed by landing a head on the coin toss. 

Find P(A). 

c. Let B be the event that a number less than 2 is rolled, followed by landing a head on the coin 

toss. Are the events A and B mutually exclusive? Explain your answer in 1 - 3 complete 
sentences, including numerical justification. 

Exercise 4.11.7 (Solution on p. 198.) 

An experiment consists of tossing a nickel, a dime and a quarter. Of interest is the side the coin 
lands on. 

a. List the sample space. 

b. Let A be the event that there are at least two tails. Find P(A). 

c. Let B be the event that the first and second tosses land on heads. Are the events A and B 

mutually exclusive? Explain your answer in 1 - 3 complete sentences, including justification. 

Exercise 4.11.8 

Consider the following scenario: 

• Let P(C) = 0.4 

• Let P(D) = 0.5 

• Let P(C I D) = 0.6 

a. Find P(C AND D) . 

b. Are C and D mutually exclusive? Why or why not? 

c. Are C and D independent events? Why or why not? 

d. Find P(C OR D) . 

e. FindP(DIC). 
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Exercise 4.11.9 

E and F mutually exclusive events. P (E) = 0.4; P (F) = 0.5. Find P (E \ F) 

Exercise 4.11.10 

/ and K are independent events. P(J I K) = 0.3. Find P (/) . 

Exercise 4.11.11 

If and V are mutually exclusive events. P (U) = 0.26; P (V) — 0.37. Find: 

a. P(U AND V) = 

b. P(U I V) = 

c. P(U OR V) = 



(Solution on p. 198.) 



(Solution on p. 198.) 



Exercise 4.11.12 

Q and R are independent events. P (Q) 

Exercise 4.11.13 

Y and Z are independent events. 



0A ; P (Q AND R) 



a. Rewrite the basic Addition Rule P(Y OR Z) = P (Y) 

information that Y and Z are independent events. 

b. Use the rewritten rule to find P (Z) if P (Y OR Z) = 0.71 and P (Y) = 0.42. 



= 0.1 . Find P(R). 

(Solution on p. 198.) 

P (Z) - P (Y AND Z) using the 



Exercise 4.11.14 

G and H are mutually exclusive events. P (G) = 0.5; P (H) = 0.3 

a. Explain why the following statement MUST be false: P (H | G) = 0.4 . 

b. Find: P(HORG). 

c. Are G and H independent or dependent events? Explain in a complete sentence. 

Exercise 4.11.15 (Solution on p. 198.) 

The following are real data from Santa Clara County, CA. As of a certain time, there had been 
a total of 3059 documented cases of AIDS in the county. They were grouped into the following 
categories (Source: Santa Clara County Public H.D.): 





Homosexual/Bisexual 


IV Drug User* 


Heterosexual Contact 


Other 


Totals 


Female 





70 


136 


49 




Male 


2146 


463 


60 


135 




Totals 













Table 4.5: * includes homosexual /bisexual IV drug users 

Suppose one of the persons with AIDS in Santa Clara County is randomly selected. Compute the 
following: 

a. P(person is female) = 

b. P(person has a risk factor Heterosexual Contact) = 

c. P(person is female OR has a risk factor of IV Drug User) = 

d. P(person is female AND has a risk factor of Homosexual /Bisexual) = 

e. P(person is male AND has a risk factor of IV Drug User) = 

f . P(female GIVEN person got the disease from heterosexual contact) = 

g. Construct a Venn Diagram. Make one group females and the other group heterosexual contact. 
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Exercise 4.11.16 

Solve these questions using probability rules. Do NOT use the contingency table above. 3059 
cases of AIDS had been reported in Santa Clara County, CA, through a certain date. Those cases 
will be our population. Of those cases, 6.4% obtained the disease through heterosexual contact 
and 7.4% are female. Out of the females with the disease, 53.3% got the disease from heterosexual 
contact. 

a. P(person is female) = 

b. P(person obtained the disease through heterosexual contact) = 

c. P(female GIVEN person got the disease from heterosexual contact) = 

d. Construct a Venn Diagram. Make one group females and the other group heterosexual contact. 

Fill in all values as probabilities. 

Exercise 4.11.17 (Solution on p. 199.) 

The following table identifies a group of children by one of four hair colors, and by type of hair. 



Hair Type 


Brown 


Blond 


Black 


Red 


Totals 


Wavy 


20 




15 


3 


43 


Straight 


80 


15 




12 




Totals 




20 






215 



Table 4.6 



a. Complete the table above. 

b. What is the probability that a randomly selected child will have wavy hair? 

c. What is the probability that a randomly selected child will have either brown or blond hair? 

d. What is the probability that a randomly selected child will have wavy brown hair? 

e. What is the probability that a randomly selected child will have red hair, given that he has 

straight hair? 

f. If B is the event of a child having brown hair, find the probability of the complement of B. 

g. In words, what does the complement of B represent? 

Exercise 4.11.18 

A previous year, the weights of the members of the San Francisco 49ers and the Dallas Cowboys 
were published in the San Jose Mercury News. The factual data are compiled into the following 
table. 



Shirt* 


<210 


211-250 


251-290 


290 < 


1-33 


21 


5 








34-66 


6 


18 


7 


4 


66-99 


6 


12 


22 


5 



Table 4.7 
For the following, suppose that you randomly select one player from the 49ers or Cowboys. 

a. Find the probability that his shirt number is from 1 to 33. 

b. Find the probability that he weighs at most 210 pounds. 



184 



CHAPTER 4. PROBABILITY TOPICS 



c. Find the probability that his shirt number is from 1 to 33 AND he weighs at most 210 pounds. 

d. Find the probability that his shirt number is from 1 to 33 OR he weighs at most 210 pounds. 

e. Find the probability that his shirt number is from 1 to 33 GIVEN that he weighs at most 210 

pounds. 

f. If having a shirt number from 1 to 33 and weighing at most 210 pounds were independent 

events, then what should be true about P(Shirt# 1-33 I < 210 pounds)? 

Exercise 4.11.19 (Solution on p. 199.) 

Approximately 281,000,000 people over age 5 live in the United States. Of 

these people, 55,000,000 speak a language other than English at home. Of 

those who speak another language at home, 62.3% speak Spanish. (Source: 

http://www.census.gov/hhes/socderno/language/data/acs/ACS-12.pdf) 

Let: E = speak English at home; E' = speak another language at home; S = speak Spanish; 

Finish each probability statement by matching the correct answer. 



Probability Statements 


Answers 


a. P(E') = 


i. 0.8043 


b. P(E) = 


ii. 0.623 


c. P(S and E') = 


iii. 0.1957 


d. P(S 1 E') = 


iv. 0.1219 



Table 4.8 



Exercise 4.11.20 

The probability that a male develops some form of cancer in his lifetime is 0.4567 (Source: Ameri- 
can Cancer Society). The probability that a male has at least one false positive test result (meaning 
the test comes back for cancer when the man does not have it) is 0.51 (Source: USA Today). Some of 
the questions below do not have enough information for you to answer them. Write "not enough 
information" for those answers. 

Let: C = a man develops cancer in his lifetime; P = man has at least one false positive 

a. Construct a tree diagram of the situation. 

b. P(C) = 

c. P(P\C) = 

d. P(P\C ) = 

e. If a test comes up positive, based upon numerical values, can you assume that man has cancer? 

Justify numerically and explain why or why not. 

Exercise 4.11.21 (Solution on p. 199.) 

In 1994, the U.S. government held a lottery to issue 55,000 Green Cards (permits for non-citizens 
to work legally in the U.S.). Renate Deutsch, from Germany, was one of approximately 6.5 million 
people who entered this lottery. Let G = won Green Card. 

a. What was Renate's chance of winning a Green Card? Write your answer as a probability state- 

ment. 

b. In the summer of 1994, Renate received a letter stating she was one of 110,000 finalists chosen. 

Once the finalists were chosen, assuming that each finalist had an equal chance to win, what 
was Renate's chance of winning a Green Card? Let F = was a finalist. Write your answer as 
a conditional probability statement. 
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c. Are G and F independent or dependent events? Justify your answer numerically and also 

explain why. 

d. Are G and F mutually exclusive events? Justify your answer numerically and also explain why. 

NOTE: P.S. Amazingly on 2/1/95, Renate learned that she would receive her Green Card - true 
story! 

Exercise 4.11.22 

Three professors at George Washington University did an experiment to determine if economists 
are more selfish than other people. They dropped 64 stamped, addressed envelopes with $10 cash 
in different classrooms on the George Washington campus. 44% were returned overall. From the 
economics classes 56% of the envelopes were returned. From the business, psychology, and history 
classes 31% were returned. (Source: Wall Street Journal) 

Let: R = money returned; E = economics classes; O = other classes 

a. Write a probability statement for the overall percent of money returned. 

b. Write a probability statement for the percent of money returned out of the economics classes. 

c. Write a probability statement for the percent of money returned out of the other classes. 

d. Is money being returned independent of the class? Justify your answer numerically and explain 

it. 

e. Based upon this study, do you think that economists are more selfish than other people? Explain 

why or why not. Include numbers to justify your answer. 

Exercise 4.11.23 (Solution on p. 199.) 

The chart below gives the number of suicides estimated in the U.S. for a recent year by age, race 
(black and white), and sex. We are interested in possible relationships between age, race, and sex. 
We will let suicide victims be our population. (Source: The National Center for Health Statistics, 
U.S. Dept. of Health and Human Services) 



Race and Sex 


1-14 


15-24 


25-64 


over 64 


TOTALS 


white, male 


210 


3360 


13,610 




22,050 


white, female 


80 


580 


3380 




4930 


black, male 


10 


460 


1060 




1670 


black, female 





40 


270 




330 


all others 












TOTALS 


310 


4650 


18,780 




29,760 



Table 4.9 



NOTE: Do not include "all others" for parts (f), (g), and (i). 



a. Fill in the column for the suicides for individuals over age 64. 

b. Fill in the row for all other races. 

c. Find the probability that a randomly selected individual was a white male. 

d. Find the probability that a randomly selected individual was a black female. 

e. Find the probability that a randomly selected individual was black 
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f. Find the probability that a randomly selected individual was male. 

g. Out of the individuals over age 64, find the probability that a randomly selected individual was 

a black or white male. 
h. Comparing "Race and Sex" to "Age," which two groups are mutually exclusive? How do you 

know? 
i. Are being male and committing suicide over age 64 independent events? How do you know? 

The next two questions refer to the following: The percent of licensed U.S. drivers (from a recent year) 
that are female is 48.60. Of the females, 5.03% are age 19 and under; 81.36% are age 20 - 64; 13.61% are age 
65 or over. Of the licensed U.S. male drivers, 5.04% are age 19 and under; 81.43% are age 20 - 64; 13.53% are 
age 65 or over. (Source: Federal Highway Administration, U.S. Dept. of Transportation) 

Exercise 4.11.24 

Complete the following: 

a. Construct a table or a tree diagram of the situation. 

b. P(driver is female) = 

c. P(driver is age 65 or over I driver is female) = 

d. P(driver is age 65 or over AND female) = 

e. In words, explain the difference between the probabilities in part (c) and part (d). 

f. P(driver is age 65 or over) = 

g. Are being age 65 or over and being female mutually exclusive events? How do you know 

Exercise 4.11.25 (Solution on p. 199.) 

Suppose that 10,000 U.S. licensed drivers are randomly selected. 

a. How many would you expect to be male? 

b. Using the table or tree diagram from the previous exercise, construct a contingency table of 

gender versus age group. 

c. Using the contingency table, find the probability that out of the age 20 - 64 group, a randomly 

selected driver is female. 

Exercise 4.11.26 

Approximately 86.5% of Americans commute to work by car, truck or van. Out of that group, 
84.6% drive alone and 15.4% drive in a carpool. Approximately 3.9% walk to work and approxi- 
mately 5.3% take public transportation. (Source: Bureau of the Census, U.S. Dept. of Commerce. 
Disregard rounding approximations.) 

a. Construct a table or a tree diagram of the situation. Include a branch for all other modes of 

transportation to work. 

b. Assuming that the walkers walk alone, what percent of all commuters travel alone to work? 

c. Suppose that 1000 workers are randomly selected. How many would you expect to travel alone 

to work? 

d. Suppose that 1000 workers are randomly selected. How many would you expect to drive in a 

carpool? 

Exercise 4.11.27 

Explain what is wrong with the following statements. Use complete sentences. 

a. If there's a 60% chance of rain on Saturday and a 70% chance of rain on Sunday, then there's a 

130% chance of rain over the weekend. 

b. The probability that a baseball player hits a home run is greater than the probability that he 

gets a successful hit. 



187 



4.11.1 Try these multiple choice questions. 

The next two questions refer to the following probability tree diagram which shows tossing an unfair 
coin FOLLOWED BY drawing one bead from a cup containing 3 red (R), 4 yellow (Y) and 5 blue (B) beads. 
For the coin, P (H) = | and P (T) = | where H = "heads" and T = "tails". 



2/3 



1/3 



R 


3/12 


Y 


4/12 




B 5/12 




R 3/12 


Y 


4/12 



5/12 



B 



Figure 4.3 



Exercise 4.11.28 

Find P(tossing a Head on the coin AND a Red bead) 



(Solution on p. 200.) 



A 


2 






3 




B. 


5 

15 




C. 
D. 


6 
56 

5 
36 




Exercise 4.11.29 


Find P(Blue 


bead) 


A 


15 






36 




B. 


10 
36 




C. 
D. 


10 
12 
6 
36 





(Solution on p. 200.) 



The next three questions refer to the following table of data obtained from www.baseball-almanac.com 12 
showing hit information for 4 well known baseball players. Suppose that one hit from the table is randomly 
selected. 



2 http://cnx.org/content/ml6836/latest/ www.baseball-almanac.com 
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NAME 


Single 


Double 


Triple 


Home Run 


TOTAL HITS 


Babe Ruth 


1517 


506 


136 


714 


2873 


Jackie Robinson 


1054 


273 


54 


137 


1518 


Ty Cobb 


3603 


174 


295 


114 


4189 


Hank Aaron 


2294 


624 


98 


755 


3771 


TOTAL 


8471 


1577 


583 


1720 


12351 



Table 4.10 



(Solution on p. 200.) 



(Solution on p. 200.) 



Exercise 4.11.30 

Find P(hit was made by Babe Ruth). 

A 1518 

Am 2873 

R 2873 

D - 12351 

r 583 

*-• 12351 

r> 4189 

VJ ' 12351 

Exercise 4.11.31 

Find P(hit was made by Ty Cobb I The hit was a Home Run) 

A 4189 
/i - 12351 

R Hi. 
D. 1720 

r 1Z?0 
*-• 4189 

VJ ' 12351 

Exercise 4.11.32 (Solution on p. 200.) 

Are the hit being made by Hank Aaron and the hit being a double independent events? 

A. Yes, because P(hit by Hank Aaron I hit is a double) = P(hit by Hank Aaron) 

B. No, because P(hit by Hank Aaron I hit is a double) 7^ P(hit is a double) 

C. No, because P(hit is by Hank Aaron I hit is a double) 7^ P(hit by Hank Aaron) 

D. Yes, because P(hit is by Hank Aaron I hit is a double) = P(hit is a double) 



Exercise 4.11.33 

Given events G and H: P(G) = 0.43 ; P(H) = 0.26 ; P(H and G) = 0.14 

A. FindP(HorG) 

B. Find the probability of the complement of event (H and G) 

C. Find the probability of the complement of event (H or G) 

Exercise 4.11.34 

Given events J and K: P(J) = 0.18 ; P(K) = 0.37 ; P(J or K) = 0.45 

A. Find P(J and K) 

B. Find the probability of the complement of event (J and K) 

C. Find the probability of the complement of event (J or K) 



(Solution on p. 200.) 



(Solution on p. 200.) 
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Exercise 4.11.35 (Solution on p. 200.) 

United Blood Services is a blood bank that serves more than 500 hospitals in 18 states. Accord- 
ing to their website, http://www.unitedbloodservices.org/humanbloodtypes.html, a person with 
type O blood and a negative Rh factor (Rh— ) can donate blood to any person with any blood type. 
Their data show that 43% of people have type O blood and 15% of people have Rh— factor; 52% 
of people have type O or Rh— factor. 

A. Find the probability that a person has both type O blood and the Rh— factor 

B. Find the probability that a person does NOT have both type O blood and the Rh— factor. 

Exercise 4.11.36 (Solution on p. 200.) 

At a college, 72% of courses have final exams and 46% of courses require research papers. Suppose 
that 32% of courses have a research paper and a final exam. Let F be the event that a course has a 
final exam. Let R be the event that a course requires a research paper. 

A. Find the probability that a course has a final exam or a research project. 

B. Find the probability that a course has NEITHER of these two requirements. 

Exercise 4.11.37 (Solution on p. 200.) 

In a box of assorted cookies, 36% contain chocolate and 12% contain nuts. Of those, 8% contain 
both chocolate and nuts. Sean is allergic to both chocolate and nuts. 

A. Find the probability that a cookie contains chocolate or nuts (he can't eat it). 

B. Find the probability that a cookie does not contain chocolate or nuts (he can eat it). 

Exercise 4.11.38 (Solution on p. 200.) 

A college finds that 10% of students have taken a distance learning class and that 40% of students 
are part time students. Of the part time students, 20% have taken a distance learning class. Let D 
= event that a student takes a distance learning class and E = event that a student is a part time 
student 

A. Find P(D and E) 

B. FindP(E I D) 

C. FindP(DorE) 

D. Using an appropriate test, show whether D and E are independent. 

E. Using an appropriate test, show whether D and E are mutually exclusive. 

Exercise 4.11.39 (Solution on p. 200.) 

When the Euro coin was introduced in 2002, two math professors had their statistics students test 
whether the Belgian 1 Euro coin was a fair coin. They spun the coin rather than tossing it, and it 
was found that out of 250 spins, 140 showed a head (event H) while 110 showed a tail (event T). 
Therefore, they claim that this is not a fair coin. 

A. Based on the data above, find P(H) and P(T). 

B. Use a tree to find the probabilities of each possible outcome for the experiment of tossing the 

coin twice. 

C. Use the tree to find the probability of obtaining exactly one head in two tosses of the coin. 

D. Use the tree to find the probability of obtaining at least one head. 

Exercise 4.11.40 

A box of cookies contains 3 chocolate and 7 butter cookies. Miguel randomly selects a cookie and 
eats it. Then he randomly selects another cookie and eats it also. (How many cookies did he take?) 
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A. Draw the tree that represents the possibilities for the cookie selections. Write the probabilities 

along each branch of the tree. 

B. Are the probabilities for the flavor of the SECOND cookie that Miguel selects independent of 

his first selection? Explain. 

C. For each complete path through the tree, write the event it represents and find the probabilities. 

D. Let S be the event that both cookies selected were the same flavor. Find P(S). 

E. Let T be the event that both cookies selected were different flavors. Find P(T) by two different 

methods: by using the complement rule and by using the branches of the tree. Your answers 
should be the same with both methods. 

F. Let U be the event that the second cookie selected is a butter cookie. Find P(U). 

**Exercises 33 - 40 contributed by Roberta Bloom 
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4.12 Review 13 

The first six exercises refer to the following study: In a survey of 100 stocks on NASDAQ, the average 
percent increase for the past year was 9% for NASDAQ stocks. Answer the following: 

Exercise 4.12.1 (Solution on p. 201.) 

The "average increase" for all NASDAQ stocks is the: 

A. Population 

B. Statistic 

C. Parameter 

D. Sample 

E. Variable 

Exercise 4.12.2 (Solution on p. 201.) 

All of the NASDAQ stocks are the: 

A. Population 

B. Statistic 

C. Parameter 

D. Sample 

E. Variable 

Exercise 4.12.3 (Solution on p. 201.) 

9% is the: 

A. Population 

B. Statistic 

C. Parameter 

D. Sample 

E. Variable 

Exercise 4.12.4 (Solution on p. 201.) 

The 100 NASDAQ stocks in the survey are the: 

A. Population 

B. Statistic 

C. Parameter 

D. Sample 

E. Variable 

Exercise 4.12.5 (Solution on p. 201.) 

The percent increase for one stock in the survey is the: 

A. Population 

B. Statistic 

C. Parameter 

D. Sample 

E. Variable 



3 This content is available online at <http://cnx.org/content/ml6842/1.9/>. 
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Exercise 4.12.6 (Solution on p. 201.) 

Would the data collected be qualitative, quantitative - discrete, or quantitative - continuous? 

The next two questions refer to the following study: Thirty people spent two weeks around Mardi Gras 
in New Orleans. Their two-week weight gain is below. (Note: a loss is shown by a negative weight gain.) 



Weight Gain 


Frequency 


-2 


3 


-1 


5 





2 


1 


4 


4 


13 


6 


2 


11 


1 



Table 4.11 



Exercise 4.12.7 

Calculate the following values: 

a. The average weight gain for the two weeks 

b. The standard deviation 

c. The first, second, and third quartiles 



(Solution on p. 201.) 



Exercise 4.12.8 

Construct a histogram and a boxplot of the data. 
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4.13 Lab: Probability Topics 14 



Class time: 
Names: 



4.13.1 Student Learning Outcomes: 



• The student will use theoretical and empirical methods to estimate probabilities. 

• The student will appraise the differences between the two estimates. 

• The student will demonstrate an understanding of long-term relative frequencies. 



4.13.2 Do the Experiment: 

Count out 40 mixed-color M&M's® which is approximately 1 small bag's worth (distance learning classes 
using the virtual lab would want to count out 25 M&M's®). Record the number of each color in the "Pop- 
ulation" table. Use the information from this table to complete the theoretical probability questions. Next, 
put the M&M's in a cup. The experiment is to pick 2 M&M's, one at a time. Do not look at them as you 
pick them. The first time through, replace the first M&M before picking the second one. Record the results 
in the "With Replacement" column of the empirical table. Do this 24 times. The second time through, after 
picking the first M&M, do not replace it before picking the second one. Then, pick the second one. Record 
the results in the "Without Replacement" column section of the "Empirical Results" table. After you record 
the pick, put both M&M's back. Do this a total of 24 times, also. Use the data from the "Empirical Results" 
table to calculate the empirical probability questions. Leave your answers in unreduced fractional form. 
Do not multiply out any fractions. 

Population 



Color 


Quantity 


Yellow (Y) 




Green (G) 




Blue (BL) 




Brown (B) 




Orange (O) 




Red (R) 





Table 4.12 



4 This content is available online at <http://cnx.Org/content/ml6841/l.15/>. 
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Theoretical Probabilities 





With Replacement 


Without Replacement 


P (2 reds) 






P(K 1 B 2 ORB 1 K 2 ) 






P (P a AND G 2 ) 






P(G 2 \R 1 ) 






P (no yellows) 






P (doubles) 






P (no doubles) 







Table 4.13: Note: G 2 = green on second pick; Pj = red on first pick; Bj = brown on first pick; B 2 = brown on 

second pick; doubles = both picks are the same colour. 

Empirical Results 



With Replacement 


Without Replacement 










v / ) \ / ) 


v / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 


( , ) ( , ) 


\ / ) \ / ) 


( , ) ( , ) 


\ / ) \ / ) 


V / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 


\ / ) \ / ) 










\ / ) \ / ) 


\ / ) \ / ) 


\ — / — ) \ — / — ) 


\ — / — ) \ — / — ) 



Table 4.14 
Empirical Probabilities 





With Replacement 


Without Replacement 


P (2 reds) 






P (PiB 2 OR BiP 2 ) 






P(Rx AND G 2 ) 






P(G 2 IRi) 






P (no yellows) 






P (doubles) 






P (no doubles) 







195 
Table 4.15: Note: 



4.13.3 Discussion Questions 

1 . Why are the "With Replacement" and "Without Replacement" probabilities different? 

2. Convert P(no yellows) to decimal format for both Theoretical "With Replacement" and for Empirical 
"With Replacement". Round to 4 decimal places. 

a. Theoretical "With Replacement": P(no yellows) = 

b. Empirical "With Replacement": P(no yellows) = 

c. Are the decimal values "close"? Did you expect them to be closer together or farther apart? Why? 

3. If you increased the number of times you picked 2 M&M's to 240 times, why would empirical proba- 
bility values change? 

4. Would this change (see (3) above) cause the empirical probabilities and theoretical probabilities to be 
closer together or farther apart? How do you know? 

5. Explain the differences in what P (G\ AND R2) an d P (R\ I G2) represent. Hint: Think about the 
sample space for each probability. 
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Solutions to Exercises in Chapter 4 

Solution to Exercise 4.2.1 (p. 160) 

a. P(L')=P(S) 

b. P(MorS) 

c. P(FandL) 

d. P(MIL) 

e. P(LIM) 

f. P(SIF) 
g- P(FIL) 
h. P(ForL) 

i. P(MandS) 
j. P(F) 

Solution to Example 4.2, Problem (p. 162) 

No. C = {3, 5} and E = {1, 2, 3, 4}. P (C AND E) 
0. 
Solution to Example 4.11, Problem 1 (p. 168) 



\. To be mutually exclusive, P (C AND E) must be 



Hiking Area Preference 



Sex 


The Coastline 


Near Lakes and Streams 


On Mountain Peaks 


Total 


Female 


18 


16 


11 


45 


Male 


16 


25 


14 


55 


Total 


34 


41 


25 


100 



Table 4.16 



Solution to Example 4.11, Problem 2 (p. 168) 

a. P(FANDC) = $j, = 0.18 

b. P(F) ■ P(C) = $jj • ^ = 0.45 • 0.34 = 0.153 

P(FANDC) ^ P(F) ■ P(C), so the events F and Care not independent. 
Solution to Example 4.11, Problem 3 (p. 169) 

a. The word 'given' tells you that this is a conditional. 

b. P(MIL) = |f 

c. No, the sample space for this problem is 41. 

Solution to Example 4.11, Problem 4 (p. 169) 



45 
100 



a. P(F) 

b- P(P) = & 

c. P(FANDP) = ^ 

d. P(FORP) = «, + f; 
Solution to Example 4.12, Problem 1 (p. 169) 



n 

100 



59 
100 
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Door Choice 



Caught or Not 


Door One 


Door Two 


Door Three 


Total 


Caught 


l 
15 


l 

12 


l 

6 


19 
60 


Not Caught 


4 
15 


3 
12 


1 
6 


41 
60 


Total 


5 
15 


4 
12 


2 
6 


1 



Table 4.17 



Solution to Example 4.16, Problem 1 (p. 172) 

B1R1; B1R2; B1R3; B2R1; B2R2; B2R3; B3R1; B3R2; B3R3; B4R1; B4R2; B4R3; B5R1; B5R2; B5R3; B6R1; 
B6R2; B6R3; B7R1; B7R2; B7R3; B8R1; B8R2; B8R3 
Solution to Example 4.16, Problem 6 (p. 173) 
P(BB) = ^ 

Solution to Example 4.16, Problem 7 (p. 173) 
P(B on 2nd draw I R on 1st draw) = yy 

There are 9 + 24 outcomes that have R on the first draw (9 RR and 24 RB). The sample space is then 
9 + 24 = 33. Twenty -four of the 33 outcomes have B on the second draw. The probability is then j|. 
Solution to Example 4.17, Problem 2 (p. 174) 
P(RBorBR)=A . £ + (£)(£) = « 

Solution to Example 4.17, Problem 3 (p. 174) 

P(R on 2d I B on 1st) = ^ 
Solution to Example 4.17, Problem 4 (p. 174) 
P(R on 1st and B on 2nd) = P(RB) = (n) (to) 
Solution to Example 4.17, Problem 5 (p. 174) 
P(BB) 



24 
110 



_8_ 
11 



7_ 
10 



Solutions to Practice 1: Contingency Tables 

Solution to Exercise 4.9.1 (p. 177) 

35,065 
100,450 

Solution to Exercise 4.9.2 (p. 177) 

19,969 
100,450 

Solution to Exercise 4.9.3 (p. 178) 

4,715 
100,450 

Solution to Exercise 4.9.4 (p. 178) 

36,636 
100,450 

Solution to Exercise 4.9.5 (p. 178) 

4715 
15,273 



Solutions to Practice 2: Calculating Probabilities 

Solution to Exercise 4.10.1 (p. 179) 

0.48 

Solution to Exercise 4.10.2 (p. 179) 

0.376 

Solution to Exercise 4.10.3 (p. 179) 

0.55 
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Solution to Exercise 4.10.5 (p. 179) 

0.2068 

Solution to Exercise 4.10.7 (p. 179) 

No 

Solution to Exercise 4.10.8 (p. 179) 

0.6492 

Solution to Exercise 4.10.10 (p. 179) 

No 



b- (|) (I 



Solutions to Homework 

Solution to Exercise 4.11.1 (p. 180) 

a. {Gl, G2, G3, G4, G5, Yl, Y2, Y3} 

b- I 

d. § 

e. 1 

f. No 

Solution to Exercise 4.11.3 (p. 180) 

(I) (I) 

c (|) (I) + (§) (?) + (f) (I) 

d. « 

e. No 

Solution to Exercise 4.11.5 (p. 181) 
a. {GH,GT,BH,BT,RH,RT} 

D. 20 

c. Yes 

d. No 

Solution to Exercise 4.11.7 (p. 181) 

a. {(HHH) , (HHT) , (HTH) , (HTT) , (THH) , (THT) , (TTH) , (TTT)} 

b. g 

c. Yes 

Solution to Exercise 4.11.9 (p. 182) 



Solution to Exercise 4.11.11 (p. 182) 

a. 

b. 

c. 0.63 

Solution to Exercise 4.11.13 (p. 182) 

b. 0.5 

Solution to Exercise 4.11.15 (p. 182) 

The completed contingency table is as follows: 
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Homosexual/Bisexual 


IV Drug User* 


Heterosexual Contact 


Other 


Totals 


Female 





70 


136 


49 


255 


Male 


2146 


463 


60 


135 


2804 


Totals 


2146 


533 


196 


184 


3059 



Table 4.18: * includes homosexual/bisexual IV drug users 



, .255. 
"• 3059 

b ^ 

v - 3059 

*- 3059 

d. 

P J63_ 

e- 3059 

f 136 

*• 196 

Solution to Exercise 4.11.17 (p. 183) 
b. 



43 
215 
120 
215 
20 
215 
12 
172 
115 
215 



Solution to Exercise 4.11.19 (p. 184) 

a. iii 

b. i 

c. iv 

d. ii 

Solution to Exercise 4.11.21 (p. 184) 

a. P(G) =0.008 

b. 0.5 

c. dependent 

d. No 



Solution to Exercise 4.11.23 (p. 185) 



r 22050 
*- 29760 

u - 29760 

_ 2000 

e- 29760 

r 23720 

r< 29760 

a 5010 

&• 6020 

h. Black females and ages 1-14 
i. No 



Solution to Exercise 4.11.25 (p. 186) 



a. 5140 
c. 0.49 
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Solution to Exercise 4.11.28 (p. 187) 

C 

Solution to Exercise 4.11.29 (p. 187) 

A 

Solution to Exercise 4.11.30 (p. 188) 

B 

Solution to Exercise 4.11.31 (p. 188) 

B 

Solution to Exercise 4.11.32 (p. 188) 

C 

Solution to Exercise 4.11.33 (p. 188) 

A. P(H or G) = P(H) + P(G) - P(H and G) = 0.26 + 0.43 - 0.14 = 0.55 

B. P( NOT (H and G) ) = 1 - P(H and G) = 1 - 0.14 = 0.86 

C. P( NOT (H or G) ) = 1 - P(H or G) = 1 - 0.55 = 0.45 

Solution to Exercise 4.11.34 (p. 188) 

A. P(J or K) = P(J) + P(K) - P(J and K); 0.45 = 0.18 + 0.37 - P(J and K) ; solve to find P(J and K) = 0.10 

B. P( NOT and K) ) = 1 - P(J and K) = 1 - 0.10 = 0.90 

C. P( NOT (J or K) ) = 1 - P(J or K) = 1 - 0.45 = 0.55 

Solution to Exercise 4.11.35 (p. 189) 

A. P(Type O or Rh-) = P(Type O) + P(Rh-) - P(Type O and Rh-) 

0.52 = 0.43 + 0.15 - P(Type O and Rh-); solve to find P(Type O and Rh-) = 0.06 
6% of people have type O Rh— blood 

B. P( NOT (Type O and Rh-) ) = 1 - P(Type O and Rh-) = 1 - 0.06 = 0.94 

94% of people do not have type O Rh— blood 

Solution to Exercise 4.11.36 (p. 189) 

A. P(R or F) = P(R) + P(F) - P(R and F) = 0.72 + 0.46 - 0.32 = 0.86 

B. P( Neither R nor F ) = 1 - P(R or F) = 1 - 0.86 = 0.14 

Solution to Exercise 4.11.37 (p. 189) 

Let C be the event that the cookie contains chocolate. Let N be the event that the cookie contains nuts. 

A. P(C or N) = P(C) + P(N) - P(C and N) = 0.36 + 0.12 - 0.08 = 0.40 

B. P( neither chocolate nor nuts) = 1 - P(C or N) = 1 - 0.40 = 0.60 

Solution to Exercise 4.11.38 (p. 189) 

A. P(D and E) = P(D I E)P(E) = (0.20)(0.40) = 0.08 

B. P(E I D) = P(D and E) / P(D) = 0.08/0.10 = 0.80 

C. P(D or E) = P(D) + P(E) - P(D and E) = 0.10 + 0.40 - 0.08 = 0.42 

D. Not Independent: P(D I E) = 0.20 which does not equal P(D) = .10 

E. Not Mutually Exclusive: P(D and E) = 0.08 ; if they were mutually exclusive then we would need to have 

P(D and E) = 0, which is not true here. 

Solution to Exercise 4.11.39 (p. 189) 

A. P(H) = 140/250; P(T) = 110/250 

C. 308/625 

D. 504/625 
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Solutions to Review 
Solution to Exercise 4.12.1 (p. 191) 

C. Parameter 

Solution to Exercise 4.12.2 (p. 191) 

A. Population 

Solution to Exercise 4.12.3 (p. 191) 

B. Statistic 

Solution to Exercise 4.12.4 (p. 191) 

D. Sample 

Solution to Exercise 4.12.5 (p. 191) 

E. Variable 

Solution to Exercise 4.12.6 (p. 192) 
quantitative - continuous 
Solution to Exercise 4.12.7 (p. 192) 

a. 2.27 

b. 3.04 

c. -1,4,4 



202 CHAPTER 4. PROBABILITY TOPICS 



Chapter 5 

Discrete Random Variables 

5.1 Discrete Random Variables 1 

5.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Recognize and understand discrete probability distribution functions, in general. 

• Calculate and interpret expected values. 

• Recognize the binomial probability distribution and apply it appropriately. 

• Recognize the Poisson probability distribution and apply it appropriately (optional). 

• Recognize the geometric probability distribution and apply it appropriately (optional). 

• Recognize the hypergeometric probability distribution and apply it appropriately (optional). 

• Classify discrete word problems by their distributions. 

5.1.2 Introduction 

A student takes a 10 question true-false quiz. Because the student had such a busy schedule, he or she 
could not study and randomly guesses at each answer. What is the probability of the student passing the 
test with at least a 70%? 

Small companies might be interested in the number of long distance phone calls their employees make 
during the peak time of the day. Suppose the average is 20 calls. What is the probability that the employees 
make more than 20 long distance phone calls during the peak time? 

These two examples illustrate two different types of probability problems involving discrete random vari- 
ables. Recall that discrete data are data that you can count. A random variable describes the outcomes 
of a statistical experiment in words. The values of a random variable can vary with each repetition of an 
experiment. 

In this chapter, you will study probability problems involving discrete random distributions. You will also 
study long-term averages associated with them. 

5.1.3 Random Variable Notation 

Upper case letters like X or Y denote a random variable. Lower case letters like x or y denote the value of a 
random variable. If X is a random variable, then X is written in words, and x is given as a number. 



1 This content is available online at <http://cnx.Org/content/ml6825/l.14/>. 
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For example, let X = the number of heads you get when you toss three fair coins. The sample space for the 
toss of three fair coins is TTT; THH; HTH; HHT; HTT; THT; TTH; HHH. Then, x = 0, 1, 2, 3. X is in 
words and x is a number. Notice that for this example, the x values are countable outcomes. Because you 
can count the possible values that X can take on and the outcomes are random (the x values 0, 1, 2, 3), X is 
a discrete random variable. 

5.1.4 Optional Collaborative Classroom Activity 

Toss a coin 10 times and record the number of heads. After all members of the class have completed the 
experiment (tossed a coin 10 times and counted the number of heads), fill in the chart using a heading like 
the one below. Let X = the number of heads in 10 tosses of the coin. 



X 


Frequency of x 


Relative Frequency of x 







































Table 5.1 



Which value(s) of x occurred most frequently? 

If you tossed the coin 1,000 times, what values could x take on? Which value(s) of x do you think 

would occur most frequently? 

What does the relative frequency column sum to? 



5.2 Probability Distribution Function (PDF) for a Discrete Random 
Variable 2 



A discrete probability distribution function has two characteristics: 



• Each probability is between and 1, inclusive. 

• The sum of the probabilities is 1 . 



Example 5.1 

A child psychologist is interested in the number of times a newborn baby's crying wakes its mother 
after midnight. For a random sample of 50 mothers, the following information was obtained. Let 
X = the number of times a newborn wakes its mother after midnight. For this example, x = 0, 1, 2, 
3, 4, 5. 

P(x) = probability that X takes on a value x. 



2 This content is available online at <http://cnx.Org/content/ml6831/l.14/>. 
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X 


P(x) 





P(x=0) = I 


1 


P(x=l) = 11 


2 


P(x=2) = | 


3 


P(x=3) - I 


4 


P(x=4) = *, 


5 


P(x=5) = i 



Table 5.2 
X takes on the values 0, 1, 2, 3, 4, 5. This is a discrete PDF because 

1. Each P(x) is between and 1, inclusive. 

2. The sum of the probabilities is 1, that is, 



2 11 23 9 4 1 
50 + 50 + 50 + 50 + 50 + 50 _ 



(5.1) 



Example 5.2 

Suppose Nancy has classes 3 days a week. She attends classes 3 days a week 80% of the time, 2 
days 15% of the time, 1 day 4% of the time, and no days 1% of the time. Suppose one week is 
randomly selected. 

Problem 1 (Solution on p. 238.) 

Let X = the number of days Nancy . 



Problem 2 

X takes on what values? 



(Solution on p. 238.) 



Problem 3 (Solution on p. 238.) 

Suppose one week is randomly chosen. Construct a probability distribution table (called a PDF 
table) like the one in the previous example. The table should have two columns labeled x and P(x). 
What does the P(x) column sum to? 



5.3 Mean or Expected Value and Standard Deviation 3 

The expected value is often referred to as the "long-term"average or mean . This means that over the long 
term of doing an experiment over and over, you would expect this average. 

The mean of a random variable X is ja. If we do an experiment many times (for instance, flip a fair coin, as 
Karl Pearson did, 24,000 times and let X = the number of heads) and record the value of X each time, the 
average is likely to get closer and closer to }i as we keep repeating the experiment. This is known as the 
Law of Large Numbers. 

NOTE: To find the expected value or long term average, }i, simply multiply each value of the 
random variable by its probability and add the products. 



3 This content is available online at <http://cnx.org/content/ml6828/L16/>. 
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A Step-by-Step Example 

A men's soccer team plays soccer 0, 1, or 2 days a week. The probability that they play days is 0.2, the 
probability that they play 1 day is 0.5, and the probability that they play 2 days is 0.3. Find the long-term 
average, }i, or expected value of the days per week the men's soccer team plays soccer. 

To do the problem, first let the random variable X = the number of days the men's soccer team plays soccer 
per week. X takes on the values 0, 1, 2. Construct a PDF table, adding a column xP (x). In this column, 
you will multiply each x value by its probability. 

Expected Value Table 



X 


P(x) 


xP(x) 





0.2 


(0)(0.2) = 


1 


0.5 


(1)(0.5) = 0.5 


2 


0.3 


(2)(0.3) = 0.6 



Table 5.4: This table is called an expected value table. The table helps you calculate the expected value or 

long-term average. 



Add the last column to find the long 

(0) (0.2) + (1) (0.5) + (2) (0.3) = + 0.5 + 0.6 = 1.1. 



term average or expected value: 



The expected value is 1.1. The men's soccer team would, on the average, expect to play soccer 1.1 days 
per week. The number 1.1 is the long term average or expected value if the men's soccer team plays soccer 
week after week after week. We say y. — \.\ 

Example 5.3 

Find the expected value for the example about the number of times a newborn baby's crying 
wakes its mother after midnight. The expected value is the expected number of times a newborn 
wakes its mother after midnight. 



X 


P(X) 


*P(X) 





P(x=0) = I 


(0)(hj)=o 


1 


P(x=l) = 11 


d)(M) = M 


2 


P(x=2) = 1 


(2)(|) = i 


3 


P(x=3) = I 


(3)(|j) = l 


4 


P(x=4) = & 


(4)U) = i 


5 


P(x=5) = i 


w(k) = ii 



Table 5.5: You expect a newborn to wake its mother after midnight 2.1 times, on the average. 



105 



2.1 



Add the last column to find the expected value, yi = Expected Value - ^ 
Problem 

Go back and calculate the expected value for the number of days Nancy attends classes a week. 
Construct the third column to do so. 

Solution 

2.74 days a week. 
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Example 5.4 

Suppose you play a game of chance in which five numbers are chosen from 0, 1, 2, 3, 4, 5, 6, 7, 8, 
9. A computer randomly selects five numbers from to 9 with replacement. You pay $2 to play 
and could profit $100,000 if you match all 5 numbers in order (you get your $2 back plus $100,000). 
Over the long term, what is your expected profit of playing the game? 

To do this problem, set up an expected value table for the amount of money you can profit. 

Let X = the amount of money you profit. The values of x are not 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Since you 
are interested in your profit (or loss), the values of x are 100,000 dollars and -2 dollars. 

To win, you must get all 5 numbers correct, in order. The probability of choosing one correct 
number is ^ because there are 10 numbers. You may choose a number more than once. The 
probability of choosing all 5 numbers correctly and in order is: 



11111 

To * To * io * io * io* 



1 * 10" D = 0.00001 
Therefore, the probability of winning is 0.00001 and the probability of losing is 



(5.2) 



1 - 0.00001 = 0.99999 



The expected value table is as follows. 



(5.3) 





X 


P(x) 


xP(x) 


Loss 


-2 


0.99999 


(-2)(0.99999)=-l .99998 


Profit 


100,000 


0.00001 


(100000)(0.00001)=1 



Table 5.6: Add the last column. -1.99998 + 1 = -0.99998 

Since —0.99998 is about —1, you would, on the average, expect to lose approximately one dollar 
for each game you play However, each time you play, you either lose $2 or profit $100,000. The $1 
is the average or expected LOSS per game after playing this game over and over. 

Example 5.5 

Suppose you play a game with a biased coin. You play each game by tossing the coin once. 
P(heads) = I and P(tails) — i. If you toss a head, you pay $6. If you toss a tail, you win $10. 
If you play this game many times, will you come out ahead? 

Problem 1 (Solution on p. 238.) 

Define a random variable X. 



Problem 2 

Complete the following expected value table. 



(Solution on p. 238.) 



Table 5.7 





X 






WIN 


10 


l 

3 




LOSE 






-12 

3 
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Problem 3 

What is the expected value, ]i1 Do you come out ahead? 



(Solution on p. 238.) 



Like data, probability distributions have standard deviations. To calculate the standard deviation (<r) of a 
probability distribution, find each deviation from its expected value, square it, multiply it by its probability, 
add the products, and take the square root . To understand how to do the calculation, look at the table for 
the number of days per week a men's soccer team plays soccer. To find the standard deviation, add the 
entries in the column labeled (x — ja) ■ P (x) and take the square root. 



X 


P(x) 


xP(x) 


(x - F ) 2 P(x) 





0.2 


(0)(0.2) = 


(0-l.l) 2 (.2) = 


= 0.242 


1 


0.5 


(1)(0.5) = 0.5 


(1-1.1) 2 (.5) = 


= 0.005 


2 


0.3 


(2)(0.3) = 0.6 


(2-l.l) 2 (.3) = 


= 0.243 



Table 5.8 

Add the last column in the table. 0.242 + 0.005 + 0.243 = 0.490. The standard deviation is the square root 
of 0.49. cr= V0A9 = 0.7 

Generally for probability distributions, we use a calculator or a computer to calculate \i and a to reduce 
roundoff error. For some probability distributions, there are short-cut formulas that calculate ji and a. 

5.4 Common Discrete Probability Distribution Functions 4 

Some of the more common discrete probability functions are binomial, geometric, hypergeometric, and 
Poisson. Most elementary courses do not cover the geometric, hypergeometric, and Poisson. Your instruc- 
tor will let you know if he or she wishes to cover these distributions. 

A probability distribution function is a pattern. You try to fit a probability problem into a pattern or distri- 
bution in order to perform the necessary calculations. These distributions are tools to make solving prob- 
ability problems easier. Each distribution has its own special characteristics. Learning the characteristics 
enables you to distinguish among the different distributions. 

5.5 Binomial 5 



The characteristics of a binomial experiment are: 

1. There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes 
the number of trials. 

2. There are only 2 possible outcomes, called "success" and, "failure" for each trial. The letter p denotes 
the probability of a success on one trial and q denotes the probability of a failure on one trial, p + q = 1. 

3. The n trials are independent and are repeated using identical conditions. Because the n trials are in- 
dependent, the outcome of one trial does not help in predicting the outcome of another trial. Another 
way of saying this is that for each individual trial, the probability, p, of a success and probability, q, 
of a failure remain the same. For example, randomly guessing at a true - false statistics question has 
only two outcomes. If a success is guessing correctly, then a failure is guessing incorrectly. Suppose 

4 This content is available online at <http://cnx.Org/content/ml6821/l.6/>. 
This content is available online at <http://cnx.Org/content/ml6820/l.16/>. 
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Joe always guesses correctly on any statistics true - false question with probability p = 0.6. Then, 
q = 0.4 .This means that for every true - false statistics question Joe answers, his probability of success 
(p = 0.6) and his probability of failure (q = 0.4) remain the same. 

The outcomes of a binomial experiment fit a binomial probability distribution. The random variable X = 
the number of successes obtained in the n independent trials. 

The mean, ji, and variance, a 2 , for the binomial probability distribution is ji = np and a 2 = npq. The 
standard deviation, a, is then <r = ^Jnpq. 

Any experiment that has characteristics 2 and 3 and where n = 1 is called a Bernoulli Trial (named after 
Jacob Bernoulli who, in the late 1600s, studied them extensively). A binomial experiment takes place when 
the number of successes is counted in one or more Bernoulli Trials. 

Example 5.6 

At ABC College, the withdrawal rate from an elementary physics course is 30% for any given 
term. This implies that, for any given term, 70% of the students stay in the class for the entire 
term. A "success" could be defined as an individual who withdrew. The random variable is X = 
the number of students who withdraw from the randomly selected elementary physics class. 

Example 5.7 

Suppose you play a game that you can only either win or lose. The probability that you win any 
game is 55% and the probability that you lose is 45%. Each game you play is independent. If you 
play the game 20 times, what is the probability that you win 15 of the 20 games? Here, if you 
define X = the number of wins, then X takes on the values 0, 1, 2, 3, ..., 20. The probability of a 
success is p = 0.55. The probability of a failure is q = 0.45. The number of trials is n = 20. The 
probability question can be stated mathematically as P (x = 15). 

Example 5.8 

A fair coin is flipped 15 times. Each flip is independent. What is the probability of getting more 
than 10 heads? Let X = the number of heads in 15 flips of the fair coin. X takes on the values 0, 1, 
2, 3, ..., 15. Since the coin is fair, p = 0.5 and q = 0.5. The number of trials is n = 15. The probability 
question can be stated mathematically as P (x > 10). 

Example 5.9 

Approximately 70% of statistics students do their homework in time for it to be collected and 
graded. Each student does homework independently. In a statistics class of 50 students, what is 
the probability that at least 40 will do their homework on time? Students are selected randomly. 

Problem 1 (Solution on p. 238.) 

This is a binomial problem because there is only a success or a , there are a definite 

number of trials, and the probability of a success is 0.70 for each trial. 

Problem 2 (Solution on p. 238.) 

If we are interested in the number of students who do their homework, then how do we define 
X? 

Problem 3 (Solution on p. 238.) 

What values does x take on? 

Problem 4 (Solution on p. 238.) 

What is a "failure", in words? 

The probability of a success is p = 0.70. The number of trial is n = 50. 

Problem 5 (Solution on p. 238.) 

If p + q = 1, then what is q? 
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Problem 6 (Solution on p. 238.) 
The words "at least" translate as what kind of inequality for the probability question P (x 40). 

The probability question is P (x > 40). 



5.5.1 Notation for the Binomial: B = Binomial Probability Distribution Function 

X~B(n,p) 

Read this as "X is a random variable with a binomial distribution." The parameters are n and p. n = number 

of trials p = probability of a success on each trial 
Example 5.10 

It has been stated that about 41% of adult workers have a high school diploma but do not pursue 
any further education. If 20 adult workers are randomly selected, find the probability that at most 
12 of them have a high school diploma but do not pursue any further education. How many adult 
workers do you expect to have a high school diploma but do not pursue any further education? 

Let X = the number of workers who have a high school diploma but do not pursue any further 
education. 

X takes on the values 0, 1, 2, ..., 20 where n = 20 and p = 0.41. q = 1 - 0.41 = 0.59. X ~ B (20,0.41) 

Find P (x < 12) . P (x < 12) = 0.9738. (calculator or computer) 

Using the TI-83+ or the TI-84 calculators, the calculations are as follows. Go into 2nd DISTR. The 
syntax for the instructions are 

To calculate (x = value): binompdf(n, p, number) If "number" is left out, the result is the binomial 
probability table. 

To calculate P (x < value): binomcdf(n, p, number) If "number" is left out, the result is the cumu- 
lative binomial probability table. 

For this problem: After you are in 2nd DISTR, arrow down to A:binomcdf. Press ENTER. Enter 
20,.41,12). The result is P (x < 12) = 0.9738. 

NOTE: If you want to find P (x = 12), use the pdf (0:binompdf). If you want to find P (x (> , 12)), 
use 1 - binomcdf(20,.41,12). 

The probability at most 12 workers have a high school diploma but do not pursue any further 
education is 0.9738 

The graph of x ~ B (20,0.41) is: 
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The y-axis contains the probability of x, where X ; 
school diploma. 



the number of workers who have only a high 



The number of adult workers that you expect to have a high school diploma but not pursue any 
further education is the mean, \i = np = (20) (0.41) = 8.2. 



npq. 



The standard deviation is a = 



The formula for the variance is o 2 = 
^/(20) (0.41) (0.59) = 2.20. 

Example 5.11 

The following example illustrates a problem that is not binomial. It violates the condition of in- 
dependence. ABC College has a student advisory committee made up of 10 staff members and 
6 students. The committee wishes to choose a chairperson and a recorder. What is the proba- 
bility that the chairperson and recorder are both students? All names of the committee are put 
into a box and two names are drawn without replacement. The first name drawn determines the 
chairperson and the second name the recorder. There are two trials. However, the trials are not 
independent because the outcome of the first trial affects the outcome of the second trial. The 
probability of a student on the first draw is jg. The probability of a student on the second draw 
is jjj, when the first draw produces a student. The probability is ^ when the first draw produces 
a staff member. The probability of drawing a student's name changes for each of the trials and, 
therefore, violates the condition of independence. 
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,6 



5.6 Summary of Functions 



Formula 5.1: Binomial 
X~B(n,p) 

X = the number of successes in n independent trials 

n = the number of independent trials 

X takes on the values x — 0,1, 2, 3, ...,n 

p = the probability of a success for any trial 

q = the probability of a failure for any trial 

p + q — 1 q = 1 — p 

The mean is p — np. The standard deviation is c = ^Jnpq. 

Formula 5.2: Geometric 
X~G(p) 

X = the number of independent trials until the first success (count the failures and the first success) 

X takes on the values x= 1, 2, 3, ... 

p = the probability of a success for any trial 

q = the probability of a failure for any trial 

p + q — 1 

q = l-p 

The mean is ,u = £ 



The standard deviation isu= .M ((^) — 1 

Formula 5.3: Hypergeometric 

X~H(r,b,n) 

X = the number of items from the group of interest that are in the chosen sample. 

X may take on the values x= 0, 1, ..., up to the size of the group of interest. (The minimum value 
for X may be larger than in some instances.) 

r = the size of the group of interest (first group) 

b= the size of the second group 

n= the size of the chosen sample. 

n < r + b 

The mean is: p — ^ 



6 This content is available online at <http://cnx.Org/content/ml6833/l.10/>. 
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The standard deviation is: a = , / r \, r - L - 

Y {r+bf(r+b-l) 

Formula 5.4: Poisson 
X - ?(y) 

X = the number of occurrences in the interval of interest 

X takes on the values x = 0, 1, 2, 3, ... 

The mean ji is typically given. (A is often used as the mean instead of }i.) When the Poisson is 
used to approximate the binomial, we use the binomial mean ji = np. n is the binomial number 
of trials, p = the probability of a success for each trial. This formula is valid when n is "large" and 
p "small" (a general rule is that n should be greater than or equal to 20 and p should be less than 
or equal to 0.05). If n is large enough and p is small enough then the Poisson approximates the 
binomial very well. The variance is o 2 = ji and the standard deviation is a = ^ffi 
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5.7 Practice 1: Discrete Distribution 7 

5.7.1 Student Learning Outcomes 

• The student will analyze the properties of a discrete distribution. 

5.7.2 Given: 

A ballet instructor is interested in knowing what percent of each year 's class will continue on to the next, 
so that she can plan what classes to offer. Over the years, she has established the following probability 
distribution. 

• Let X = the number of years a student will study ballet with the teacher. 

• Let P (x) = the probability that a student will study ballet x years. 

5.7.3 Organize the Data 

Complete the table below using the data provided. 



X 


P(x) 


x*P(x) 


1 


0.10 




2 


0.05 




3 


0.10 




4 






5 


0.30 




6 


0.20 




7 


0.10 





Table 5.9 



.7.1 

define the Random Variable X. 

..7.2 



Exercise 5. 
In words, < 

Exercise 5. 

P(x = 4) 

Exercise 5. 

P(x<4) 

Exercise 5.' 

On average, how many years would you expect a child to study ballet with this teacher? 



;.7.3 



..7.4 



5.7.4 Discussion Question 

Exercise 5.7.5 

What does the column "P(x)" sum to and why? 

Exercise 5.7.6 

What does the column "x * P(x)" sum to and why? 



7 This content is available online at <http://cnx.Org/content/ml6830/l.14/>. 
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5.8 Practice 2: Binomial Distribution 8 

5.8.1 Student Learning Outcomes 

• The student will construct the Binomial Distribution. 

5.8.2 Given 

The Higher Education Research Institute at UCLA collected data from 203,967 incoming first-time, 
full-time freshmen from 270 four-year colleges and universities in the U.S. 71.3% of those students replied 
that, yes, they believe that same-sex couples should have the right to legal marital status. (Source: 
http://heri.ucla.edu/PDFs/pubs/TFS/Norrns/Monographs/TheArnericanFreshrnan2011.pdf). ) 

Suppose that you randomly pick 8 first-time, full-time freshmen from the survey. You are interested 
in the number that believes that same sex-couples should have the right to legal marital status 



5.8.3 Interpret the Data 

Exercise 5.8.1 

In words, define the random Variable X. 

Exercise 5.8.2 
X~ 



Exercise 5.8.3 

What values does the random variable X take on? 

Exercise 5.8.4 

Construct the probability distribution function (PDF). 



(Solution on p. 239.) 



(Solution on p. 239.) 



(Solution on p. 239.) 



X 


P(x) 







































Table 5.10 



Exercise 5.8.5 

On average («), how many would you expect to answer yes? 

Exercise 5.8.6 

What is the standard deviation (cr) ? 



8 This content is available online at <http://cnx.Org/content/ml7107/l.18/>. 



(Solution on p. 239.) 



(Solution on p. 239.) 
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(Solution on p. 239.) 
(Solution on p. 239.) 



Exercise 5.8.7 

What is the probability that at most 5 of the freshmen reply "yes"? 

Exercise 5.8.8 

What is the probability that at least 2 of the freshmen reply "yes"? 

Exercise 5.8.9 

Construct a histogram or plot a line graph. Label the horizontal and vertical axes with words 
Include numerical scaling. 
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5.9 Homework 9 



Exercise 5.9.1 

1. Complete the PDF and answer the questions. 



(Solution on p. 239.) 



X 


P(X = x) 


x-P(X = x) 





0.3 




1 


0.2 




2 






3 


0.4 





Table 5.11 



a. Find the probability that x = 2. 

b. Find the expected value. 

Exercise 5.9.2 

Suppose that you are offered the following "deal." You roll a die. If you roll a 6, you win $10. If 
you roll a 4 or 5, you win $5. If you roll a 1, 2, or 3, you pay $6. 

a. What are you ultimately interested in here (the value of the roll or the money you win)? 

b. In words, define the Random Variable X. 

c. List the values that X may take on. 

d. Construct a PDF. 

e. Over the long run of playing this game, what are your expected average winnings per game? 

f. Based on numerical values, should you take the deal? Explain your decision in complete sen- 

tences. 



Exercise 5.9.3 (Solution on p. 239.) 

A venture capitalist, willing to invest $1,000,000, has three investments to choose from. The first 
investment, a software company, has a 10% chance of returning $5,000,000 profit, a 30% chance of 
returning $1,000,000 profit, and a 60% chance of losing the million dollars. The second company, 
a hardware company, has a 20% chance of returning $3,000,000 profit, a 40% chance of returning 
$1,000,000 profit, and a 40% chance of losing the million dollars. The third company, a biotech 
firm, has a 10% chance of returning $6,000,000 profit, a 70% of no profit or loss, and a 20% chance 
of losing the million dollars. 

a. Construct a PDF for each investment. 

b. Find the expected value for each investment. 

c. Which is the safest investment? Why do you think so? 

d. Which is the riskiest investment? Why do you think so? 

e. Which investment has the highest expected return, on average? 

Exercise 5.9.4 

A theater group holds a fund-raiser. It sells 100 raffle tickets for $5 apiece. Suppose you purchase 
4 tickets. The prize is 2 passes to a Broadway show, worth a total of $150. 

a. What are you interested in here? 

9 This content is available online at <http://cnx.Org/content/ml6823/l.20/>. 



218 



CHAPTER 5. DISCRETE RANDOM VARIABLES 



b. In words, define the Random Variable X. 

c. List the values that X may take on. 

d. Construct a PDF. 

e. If this fund-raiser is repeated often and you always purchase 4 tickets, what would be your 

expected average winnings per raffle? 

Exercise 5.9.5 (Solution on p. 239.) 

Suppose that 20,000 married adults in the United States were randomly surveyed as to the number 
of children they have. The results are compiled and are used as theoretical probabilities. Let X = 
the number of children 



X 


P(X = x) 


x-P(X = x) 





0.10 




1 


0.20 




2 


0.30 




3 






4 


0.10 




5 


0.05 




6 (or more) 


0.05 





Table 5.12 

a. Find the probability that a married adult has 3 children. 

b. In words, what does the expected value in this example represent? 

c. Find the expected value. 

d. Is it more likely that a married adult will have 2-3 children or 4 - 6 children? How do you 

know? 

Exercise 5.9.6 

Suppose that the PDF for the number of years it takes to earn a Bachelor of Science (B.S.) degree 
is given below. 



X 


P(X = x) 


3 


0.05 


4 


0.40 


5 


0.30 


6 


0.15 


7 


0.10 



Table 5.13 



a. In words, define the Random Variable X. 

b. What does it mean that the values 0, 1, and 2 are not included for x in the PDF? 

c. On average, how many years do you expect it to take for an individual to earn a B.S.? 
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5.9.1 For each problem: 

a. In words, define the Random Variable X. 

b. List the values that X may take on. 

c. Give the distribution of X. X~ 

Then, answer the questions specific to each individual problem. 

Exercise 5.9.7 (Solution on p. 239.) 

Six different colored dice are rolled. Of interest is the number of dice that show a "1." 

d. On average, how many dice would you expect to show a "1"? 

e. Find the probability that all six dice show a "1." 

f. Is it more likely that 3 or that 4 dice will show a "1"? Use numbers to justify your answer 

numerically. 

Exercise 5.9.8 

More than 96 percent of the very largest colleges and universities (more than 15,000 to- 
tal enrollments) have some online offerings. Suppose you randomly pick 13 such insti- 
tutions. We are interested in the number that offer distance learning courses. (Source: 
http://en.wikipedia.org/wiki/Distance_education) 

d. On average, how many schools would you expect to offer such courses? 

e. Find the probability that at most 6 offer such courses. 

f. Is it more likely that or that 13 will offer such courses? Use numbers to justify your answer 

numerically and answer in a complete sentence. 

Exercise 5.9.9 (Solution on p. 239.) 

A school newspaper reporter decides to randomly survey 12 students to see if they will attend Tet 
(Vietnamese New Year) festivities this year. Based on past years, she knows that 18% of students 
attend Tet festivities. We are interested in the number of students who will attend the festivities. 

d. How many of the 12 students do we expect to attend the festivities? 

e. Find the probability that at most 4 students will attend. 

f. Find the probability that more than 2 students will attend. 

Exercise 5.9.10 

Suppose that about 85% of graduating students attend their graduation. A group of 22 graduating 
students is randomly chosen. 

d. How many are expected to attend their graduation? 

e. Find the probability that 17 or 18 attend. 

f . Based on numerical values, would you be surprised if all 22 attended graduation? Justify your 

answer numerically. 

Exercise 5.9.11 (Solution on p. 239.) 

At The Fencing Center, 60% of the fencers use the foil as their main weapon. We randomly survey 
25 fencers at The Fencing Center. We are interested in the numbers that do not use the foil as their 
main weapon. 

d. How many are expected to not use the foil as their main weapon? 

e. Find the probability that six do not use the foil as their main weapon. 

f. Based on numerical values, would you be surprised if all 25 did not use foil as their main 

weapon? Justify your answer numerically. 
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Exercise 5.9.12 

Approximately 8% of students at a local high school participate in after-school sports all four 
years of high school. A group of 60 seniors is randomly chosen. Of interest is the number that 
participated in after-school sports all four years of high school. 

d. How many seniors are expected to have participated in after-school sports all four years of high 

school? 

e. Based on numerical values, would you be surprised if none of the seniors participated in after- 

school sports all four years of high school? Justify your answer numerically. 

f. Based upon numerical values, is it more likely that 4 or that 5 of the seniors participated in 

after-school sports all four years of high school? Justify your answer numerically. 

Exercise 5.9.13 (Solution on p. 240.) 

The chance of having an extra fortune in a fortune cookie is about 3%. Given a bag of 144 fortune 
cookies, we are interested in the number of cookies with an extra fortune. Two distributions may 
be used to solve this problem. Use one distribution to solve the problem. 

d. How many cookies do we expect to have an extra fortune? 

e. Find the probability that none of the cookies have an extra fortune. 

f. Find the probability that more than 3 have an extra fortune. 

g. As n increases, what happens involving the probabilities using the two distributions? Explain 

in complete sentences. 

Exercise 5.9.14 

There are two games played for Chinese New Year and Vietnamese New Year. They are almost 
identical. In the Chinese version, fair dice with numbers 1, 2, 3, 4, 5, and 6 are used, along with 
a board with those numbers. In the Vietnamese version, fair dice with pictures of a gourd, fish, 
rooster, crab, crayfish, and deer are used. The board has those six objects on it, also. We will play 
with bets being $1. The player places a bet on a number or object. The "house" rolls three dice. If 
none of the dice show the number or object that was bet, the house keeps the $1 bet. If one of the 
dice shows the number or object bet (and the other two do not show it), the player gets back his 
$1 bet, plus $1 profit. If two of the dice show the number or object bet (and the third die does not 
show it), the player gets back his $1 bet, plus $2 profit. If all three dice show the number or object 
bet, the player gets back his $1 bet, plus $3 profit. 

Let X = number of matches and Y= profit per game. 

d. List the values that Y may take on. Then, construct one PDF table that includes both X & Y and 

their probabilities. 

e. Calculate the average expected matches over the long run of playing this game for the player. 

f . Calculate the average expected earnings over the long run of playing this game for the player. 

g. Determine who has the advantage, the player or the house. 

Exercise 5.9.15 (Solution on p. 240.) 

According to the South Carolina Department of Mental Health web site, for 
every 200 U.S. women, the average number who suffer from anorexia is one 
(http://www.state.sc.us/dmh/anorexia/statistics.htm 10 ). Out of a randomly chosen group of 
600 U.S. women: 

d. How many are expected to suffer from anorexia? 

e. Find the probability that no one suffers from anorexia. 



°http://www.state.sc.us/dmh/anorexia/statistics.htm 
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f. Find the probability that more than four suffer from anorexia. 

Exercise 5.9.16 

The average number of children a Japanese woman has in her lifetime is 1.37. Suppose that one 
Japanese woman is randomly chosen. 

( http://www.rnhlw.go.jp/english/policy/children/children-childrearing/index.htrnl n 
MHLW's Pamphlet) 

d. Find the probability that she has no children. 

e. Find the probability that she has fewer children than the Japanese average. 

f. Find the probability that she has more children than the Japanese average. 

Exercise 5.9.17 (Solution on p. 240.) 

The average number of children a Spanish woman has in her life- 
time is 1.47. Suppose that one Spanish woman is randomly chosen. 
(http:/ ' /www. typicallyspanish.com/news/publish/article_4897.shtml 12 ). 

d. Find the probability that she has no children. 

e. Find the probability that she has fewer children than the Spanish average. 

f . Find the probability that she has more children than the Spanish average . 

Exercise 5.9.18 

Fertile (female) cats produce an average of 3 litters per year. (Source: The Humane Society of 
the United States). Suppose that one fertile, female cat is randomly chosen. In one year, find the 
probability she produces: 

d. No litters. 

e. At least 2 litters. 

f. Exactly 3 litters. 

Exercise 5.9.19 (Solution on p. 240.) 

A consumer looking to buy a used red Miata car will call dealerships until she finds a dealership 
that carries the car. She estimates the probability that any independent dealership will have the 
car will be 28%. We are interested in the number of dealerships she must call. 

d. On average, how many dealerships would we expect her to have to call until she finds one that 

has the car? 

e. Find the probability that she must call at most 4 dealerships. 

f. Find the probability that she must call 3 or 4 dealerships. 

Exercise 5.9.20 

Suppose that the probability that an adult in America will watch the Super Bowl is 40%. Each 
person is considered independent. We are interested in the number of adults in America we must 
survey until we find one who will watch the Super Bowl. 

d. How many adults in America do you expect to survey until you find one who will watch the 

Super Bowl? 

e. Find the probability that you must ask 7 people. 

f. Find the probability that you must ask 3 or 4 people. 

http://www.mhlw.go.jp/english/policy/children/children-childrearing/index.html 
http://www.typicallyspanish.com/news/publish/article_4897.shtml 
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Exercise 5.9.21 (Solution on p. 240.) 

A group of Martial Arts students is planning on participating in an upcoming demonstration. 
6 are students of Tae Kwon Do; 7 are students of Shotokan Karate. Suppose that 8 students are 
randomly picked to be in the first demonstration. We are interested in the number of Shotokan 
Karate students in that first demonstration. Hint: Use the Hypergeometric distribution. Look in 
the Formulas section of 4: Discrete Distributions and in the Appendix Formulas. 

d. How many Shotokan Karate students do we expect to be in that first demonstration? 

e. Find the probability that 4 students of Shotokan Karate are picked for the first demonstration. 

f . Suppose that we are interested in the Tae Kwan Do students that are picked for the first demon- 

stration. Find the probability that all 6 students of Tae Kwan Do are picked for the first 
demonstration. 

Exercise 5.9.22 

The chance of a IRS audit for a tax return with over $25,000 in income is about 2% per year. We 
are interested in the expected number of audits a person with that income has in a 20 year period. 
Assume each year is independent. 

d. How many audits are expected in a 20 year period? 

e. Find the probability that a person is not audited at all. 

f. Find the probability that a person is audited more than twice. 

Exercise 5.9.23 (Solution on p. 240.) 

Refer to the previous problem. Suppose that 100 people with tax returns over $25,000 are ran- 
domly picked. We are interested in the number of people audited in 1 year. One way to solve this 
problem is by using the Binomial Distribution. Since n is large and p is small, another discrete 
distribution could be used to solve the following problems. Solve the following questions (d-f) 
using that distribution. 

d. How many are expected to be audited? 

e. Find the probability that no one was audited. 

f. Find the probability that more than 2 were audited. 

Exercise 5.9.24 

Suppose that a technology task force is being formed to study technology awareness among in- 
structors. Assume that 10 people will be randomly chosen to be on the committee from a group 
of 28 volunteers, 20 who are technically proficient and 8 who are not. We are interested in the 
number on the committee who are not technically proficient. 

d. How many instructors do you expect on the committee who are not technically proficient? 

e. Find the probability that at least 5 on the committee are not technically proficient. 

f. Find the probability that at most 3 on the committee are not technically proficient. 

Exercise 5.9.25 (Solution on p. 240.) 

Refer back to Exercise 4.15.12. Solve this problem again, using a different, though still acceptable, 
distribution. 

Exercise 5.9.26 

Suppose that 9 Massachusetts athletes are scheduled to appear at a charity benefit. The 9 are ran- 
domly chosen from 8 volunteers from the Boston Celtics and 4 volunteers from the New England 
Patriots. We are interested in the number of Patriots picked. 

d. Is it more likely that there will be 2 Patriots or 3 Patriots picked? 
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e. What is the probability that all of the volunteers will be from the Celtics 

f. Is it more likely that more of the volunteers will be from the Patriots or from the Celtics? How 

do you know? 

Exercise 5.9.27 (Solution on p. 241.) 

On average, Pierre, an amateur chef, drops 3 pieces of egg shell into every 2 batters of cake he 
makes. Suppose that you buy one of his cakes. 

d. On average, how many pieces of egg shell do you expect to be in the cake? 

e. What is the probability that there will not be any pieces of egg shell in the cake? 

f. Let's say that you buy one of Pierre's cakes each week for 6 weeks. What is the probability that 

there will not be any egg shell in any of the cakes? 

g. Based upon the average given for Pierre, is it possible for there to be 7 pieces of shell in the 

cake? Why? 

Exercise 5.9.28 

It has been estimated that only about 30% of California residents have adequate earthquake sup- 
plies. Suppose we are interested in the number of California residents we must survey until we 
find a resident who does not have adequate earthquake supplies. 

d. What is the probability that we must survey just 1 or 2 residents until we find a California 

resident who does not have adequate earthquake supplies? 

e. What is the probability that we must survey at least 3 California residents until we find a Cali- 

fornia resident who does not have adequate earthquake supplies? 

f. How many California residents do you expect to need to survey until you find a California 

resident who does not have adequate earthquake supplies? 

g. How many California residents do you expect to need to survey until you find a California 

resident who does have adequate earthquake supplies? 

Exercise 5.9.29 (Solution on p. 241.) 

Refer to the above problem. Suppose you randomly survey 11 California residents. We are inter- 
ested in the number who have adequate earthquake supplies. 

d. What is the probability that at least 8 have adequate earthquake supplies? 

e. Is it more likely that none or that all of the residents surveyed will have adequate earthquake 

supplies? Why? 

f . How many residents do you expect will have adequate earthquake supplies? 

The next 2 questions refer to the following: In one of its Spring catalogs, L.L. Bean® advertised footwear on 
29 of its 192 catalog pages. 

Exercise 5.9.30 

Suppose we randomly survey 20 pages. We are interested in the number of pages that advertise 
footwear. Each page may be picked at most once. 

d. How many pages do you expect to advertise footwear on them? 

e. Is it probable that all 20 will advertise footwear on them? Why or why not? 

f. What is the probability that less than 10 will advertise footwear on them? 

Exercise 5.9.31 (Solution on p. 241.) 

Suppose we randomly survey 20 pages. We are interested in the number of pages that advertise 
footwear. This time, each page may be picked more than once. 
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d. How many pages do you expect to advertise footwear on them? 

e. Is it probable that all 20 will advertise footwear on them? Why or why not? 

f. What is the probability that less than 10 will advertise footwear on them? 

g. Reminder: A page may be picked more than once. We are interested in the number of pages 

that we must randomly survey until we find one that has footwear advertised on it. Define 

the random variable X and give its distribution. 
h. What is the probability that you only need to survey at most 3 pages in order to find one that 

advertises footwear on it? 
i. How many pages do you expect to need to survey in order to find one that advertises footwear? 

Exercise 5.9.32 

Suppose that you roll a fair die until each face has appeared at least once. It does not matter in 
what order the numbers appear. Find the expected number of rolls you must make until each face 
has appeared at least once. 



5.9.2 Try these multiple choice problems. 

For the next three problems: The probability that the San Jose Sharks will win any given game is 0.3694 
based on a 13 year win history of 382 wins out of 1034 games played (as of a certain date). An upcoming 
monthly schedule contains 12 games. 
Let X = the number of games won in that upcoming month. 

Exercise 5.9.33 (Solution on p. 241.) 

The expected number of wins for that upcoming month is: 

A. 1.67 

B. 12 

r ^82. 

*" 1043 

D. 4.43 

Exercise 5.9.34 (Solution on p. 241.) 

What is the probability that the San Jose Sharks win 6 games in that upcoming month? 

A. 0.1476 

B. 0.2336 

C. 0.7664 

D. 0.8903 

Exercise 5.9.35 (Solution on p. 241.) 

What is the probability that the San Jose Sharks win at least 5 games in that upcoming month 

A. 0.3694 

B. 0.5266 

C. 0.4734 

D. 0.2305 

For the next two questions: The average number of times per week that Mrs. Plum's cats wake her up at 
night because they want to play is 10. We are interested in the number of times her cats wake her up each 
week. 

Exercise 5.9.36 (Solution on p. 241.) 

In words, the random variable X = 
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A. The number of times Mrs. Plum's cats wake her up each week 

B. The number of times Mrs. Plum's cats wake her up each hour 

C. The number of times Mrs. Plum's cats wake her up each night 

D. The number of times Mrs. Plum's cats wake her up 

Exercise 5.9.37 (Solution on p. 241.) 

Find the probability that her cats will wake her up no more than 5 times next week. 

A. 0.5000 

B. 0.9329 

C. 0.0378 

D. 0.0671 

Exercise 5.9.38 (Solution on p. 241.) 

People visiting video rental stores often rent more than one DVD at a time. The probability 
distribution for DVD rentals per customer at Video To Go is given below. There is 5 video limit 
per customer at this store, so nobody ever rents more than 5 DVDs. 



X 





1 


2 


3 


4 


5 


P(X=x) 


0.03 


0.50 


0.24 


? 


0.07 


0.04 



Table 5.14 



A. Describe the random variable X in words. 

B. Find the probability that a customer rents three DVDs. 

C. Find the probability that a customer rents at least 4 DVDs. 

D. Find the probability that a customer rents at most 2 DVDs. 

Another shop, Entertainment Headquarters, rents DVDs and videogames. The probability distri- 
bution for DVD rentals per customer at this shop is given below. They also have a 5 DVD limit per 
customer. 



X 





1 


2 


3 


4 


5 


P(X=x) 


0.35 


0.25 


0.20 


0.10 


0.05 


0.05 



Table 5.15 

E. At which store is the expected number of DVDs rented per customer higher? 

F. If Video to Go estimates that they will have 300 customers next week, how many DVDs do they 

expect to rent next week? Answer in sentence form. 

G. If Video to Go expects 300 customers next week and Entertainment HQ projects that they will 

have 420 customers, for which store is the expected number of DVD rentals for next week 
higher? Explain. 
H. Which of the two video stores experiences more variation in the number of DVD rentals per 
customer? How do you know that? 



Exercise 5.9.39 (Solution on p. 241.) 

A game involves selecting a card from a deck of cards and tossing a coin. The deck has 52 cards 
and 12 cards are "face cards" (Jack, Queen, or King) The coin is a fair coin and is equally likely to 
land on Heads or Tails 
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• If the card is a face card and the coin lands on Heads, you win $6 

• If the card is a face card and the coin lands on Tails, you win $2 

• If the card is not a face card, you lose $2, no matter what the coin shows. 

A. Find the expected value for this game (expected net gain or loss). 

B. Explain what your calculations indicate about your long-term average profits and losses on 

this game. 

C. Should you play this game to win money? 

Exercise 5.9.40 (Solution on p. 242.) 

You buy a lottery ticket to a lottery that costs $10 per ticket. There are only 100 tickets available 
be sold in this lottery. In this lottery there is one $500 prize, 2 $100 prizes and 4 $25 prizes. Find 
your expected gain or loss. 

Exercise 5.9.41 (Solution on p. 242.) 

A student takes a 10 question true-false quiz, but did not study and randomly guesses each an- 
swer. Find the probability that the student passes the quiz with a grade of at least 70% of the 
questions correct. 

Exercise 5.9.42 (Solution on p. 242.) 

A student takes a 32 question multiple choice exam, but did not study and randomly guesses each 
answer. Each question has 3 possible choices for the answer. Find the probability that the student 
guesses more than 75% of the questions correctly. 

Exercise 5.9.43 (Solution on p. 243.) 

Suppose that you are perfoming the probability experiment of rolling one fair six-sided die. Let F 
be the event of rolling a "4" or a "5". You are interested in how many times you need to roll the die 
in order to obtain the first "4 or 5" as the outcome. 

• p = probability of success (event F occurs) 

• q = probability of failure (event F does not occur) 

A. Write the description of the random variable X. What are the values that X can take on? Find 

the values of p and q. 

B. Find the probability that the first occurrence of event F (rolling a "4" or "5") is on the second 

trial. 

C. How many trials would you expect until you roll a "4" or "5"? 

**Exercises 38 - 43 contributed by Roberta Bloom 
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5.10 Review 



13 



The next two questions refer to the following: 

A recent poll concerning credit cards found that 35 percent of respondents use a credit card that gives them 
a mile of air travel for every dollar they charge. Thirty percent of the respondents charge more than $2000 
per month. Of those respondents who charge more than $2000, 80 percent use a credit card that gives them 
a mile of air travel for every dollar they charge. 

Exercise 5.10.1 (Solution on p. 243.) 

What is the probability that a randomly selected respondent will spend more than $2000 AND 
use a credit card that gives them a mile of air travel for every dollar they charge? 

A. (0.30) (0.35) 

B. (0.80) (0.35) 

C. (0.80) (0.30) 

D. (0.80) 

Exercise 5.10.2 (Solution on p. 243.) 

Based upon the above information, are using a credit card that gives a mile of air travel for each 
dollar spent AND charging more than $2000 per month independent events? 

A. Yes 

B. No, and they are not mutually exclusive either 

C. No, but they are mutually exclusive 

D. Not enough information given to determine the answer 

Exercise 5.10.3 (Solution on p. 243.) 

A sociologist wants to know the opinions of employed adult women about government funding 
for day care. She obtains a list of 520 members of a local business and professional women's 
club and mails a questionnaire to 100 of these women selected at random. 68 questionnaires are 
returned. What is the population in this study? 

A. All employed adult women 

B. All the members of a local business and professional women's club 

C. The 100 women who received the questionnaire 

D. All employed women with children 

The next two questions refer to the following: An article from The San Jose Mercury News was concerned 
with the racial mix of the 1500 students at Prospect High School in Saratoga, CA. The table summarizes the 
results. (Male and female values are approximate.) Suppose one Prospect High School student is randomly 
selected. 









Ethnic Group 






Gender 


White 


Asian 


Hispanic 


Black 


American Indian 


Male 


400 


168 


115 


35 


16 


Female 


440 


132 


140 


40 


14 



Table 5.16 



3 This content is available online at <http://cnx.Org/content/ml6832/l.ll/>. 
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Exercise 5.10.4 (Solution on p. 243.) 

Find the probability that a student is Asian or Male. 

Exercise 5.10.5 (Solution on p. 243.) 

Find the probability that a student is Black given that the student is Female. 

Exercise 5.10.6 (Solution on p. 243.) 

A sample of pounds lost, in a certain month, by individual members of a weight reducing clinic 
produced the following statistics: 

• Mean = 5 lbs. 

• Median = 4.5 lbs. 

• Mode = 4 lbs. 

• Standard deviation = 3.8 lbs. 

• First quartile = 2 lbs. 

• Third quartile = 8.5 lbs. 

The correct statement is: 

A. One fourth of the members lost exactly 2 pounds. 

B. The middle fifty percent of the members lost from 2 to 8.5 lbs. 

C. Most people lost 3.5 to 4.5 lbs. 

D. All of the choices above are correct. 

Exercise 5.10.7 (Solution on p. 243.) 

What does it mean when a data set has a standard deviation equal to zero? 

A. All values of the data appear with the same frequency. 

B. The mean of the data is also zero. 

C. All of the data have the same value. 

D. There are no data to begin with. 

Exercise 5.10.8 (Solution on p. 243.) 

The statement that best describes the illustration below is: 



Figure 5.1 



A. The mean is equal to the median. 

B. There is no first quartile. 

C. The lowest data value is the median. 

D. The median equals — 2 - 
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Exercise 5.10.9 (Solution on p. 243.) 

According to a recent article (San Jose Mercury News) the average number of babies born with 
significant hearing loss (deafness) is approximately 2 per 1000 babies in a healthy baby nursery. 
The number climbs to an average of 30 per 1000 babies in an intensive care nursery. 

Suppose that 1000 babies from healthy baby nurseries were randomly surveyed. Find the proba- 
bility that exactly 2 babies were born deaf. 

Exercise 5.10.10 (Solution on p. 243.) 

A "friend" offers you the following "deal." For a $10 fee, you may pick an envelope from a box 
containing 100 seemingly identical envelopes. However, each envelope contains a coupon for a 
free gift. 

• 10 of the coupons are for a free gift worth $6. 

• 80 of the coupons are for a free gift worth $8. 

• 6 of the coupons are for a free gift worth $12. 

• 4 of the coupons are for a free gift worth $40. 

Based upon the financial gain or loss over the long run, should you play the game? 

A. Yes, I expect to come out ahead in money. 

B. No, I expect to come out behind in money. 

C. It doesn't matter. I expect to break even. 

The next four questions refer to the following: Recently, a nurse commented that when a patient calls the 
medical advice line claiming to have the flu, the chance that he/she truly has the flu (and not just a nasty 
cold) is only about 4%. Of the next 25 patients calling in claiming to have the flu, we are interested in how 
many actually have the flu. 

Exercise 5.10.11 (Solution on p. 243.) 

Define the Random Variable and list its possible values. 

Exercise 5.10.12 (Solution on p. 243.) 

State the distribution of X . 

Exercise 5.10.13 (Solution on p. 243.) 

Find the probability that at least 4 of the 25 patients actually have the flu. 

Exercise 5.10.14 (Solution on p. 243.) 

On average, for every 25 patients calling in, how many do you expect to have the flu? 

The next two questions refer to the following: Different types of writing can sometimes be distinguished 
by the number of letters in the words used. A student interested in this fact wants to study the number of 
letters of words used by Tom Clancy in his novels. She opens a Clancy novel at random and records the 
number of letters of the first 250 words on the page. 

Exercise 5.10.15 (Solution on p. 243.) 

What kind of data was collected? 

A. qualitative 

B. quantitative - continuous 

C. quantitative - discrete 

Exercise 5.10.16 (Solution on p. 243.) 

What is the population under study? 
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5.11 Lab 1: Discrete Distribution (Playing Card Experiment) 14 

Class Time: 
Names: 

5.11.1 Student Learning Outcomes: 

• The student will compare empirical data and a theoretical distribution to determine if everyday ex- 
periment fits a discrete distribution. 

• The student will demonstrate an understanding of long-term probabilities. 

5.11.2 Supplies: 

• One full deck of playing cards 

5.11.3 Procedure 

The experiment procedure is to pick one card from a deck of shuffled cards. 

1. The theorectical probability of picking a diamond from a deck is: 

2. Shuffle a deck of cards. 

3. Pick one card from it. 

4. Record whether it was a diamond or not a diamond. 

5. Put the card back and reshuffle. 

6. Do this a total of 10 times 

7. Record the number of diamonds picked. 

8. Let X = number of diamonds. Theoretically, X ~ B ( , ) 



5.11.4 Organize the Data 

1 . Record the number of diamonds picked for your class in the chart below. Then calculate the relative 
frequency. 



4 This content is available online at <http://cnx.Org/content/ml6827/l.12/>. 
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X 


Frequency 


Relative Frequency 













1 










2 










3 










4 










5 










6 










7 










8 










9 










10 











Table 5.17 



2. Calculate the following: 

a. 3c = 

b. s = 

3. Construct a histogram of the empirical data. 



Relative 

Frequency 



Number of 
Diamonds 



Figure 5.2 
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5.11.5 Theoretical Distribution 

1. Build the theoretical PDF chart based on the distribution in the Procedure section above. 



2. Calculate the following: 



a. ]i —_ 

b. a = 



X 


P(x) 







1 




2 




3 




4 




5 




6 




7 




8 




9 




10 





Table 5.18 



3. Construct a histogram of the theoretical distribution. 



Probability 



Number of 
Diamonds 



Figure 5.3 
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5.11.6 Using the Data 

Calculate the following, rounding to 4 decimal places: 

NOTE: RF = relative frequency 
Use the table from the section titled "Theoretical Distribution" here: 

• P(x = 3) = 

• P(l<x<4) = 

• P(x > 8) = 

Use the data from the section titled "Organize the Data" here: 



RF (x = 3) = 
RF(1 < x <4) 
RF (x > 8) = 



5.11.7 Discussion Questions 

For questions 1. and 2., think about the shapes of the two graphs, the probabilities and the relative frequen- 
cies, the means, and the standard deviations. 

1. Knowing that data vary, describe three similarities between the graphs and distributions of the theo- 
retical and empirical distributions. Use complete sentences. (Note: These answers may vary and still 
be correct.) 

2. Describe the three most significant differences between the graphs or distributions of the theoretical 
and empirical distributions. (Note: These answers may vary and still be correct.) 

3. Using your answers from the two previous questions, does it appear that the data fit the theoretical 
distribution? In 1 - 3 complete sentences, explain why or why not. 

4. Suppose that the experiment had been repeated 500 times. Which table (from "Organize the data" 
and "Theoretical Distributions") would you expect to change (and how would it change)? Why? Why 
wouldn't the other table change? 
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5.12 Lab 2: Discrete Distribution (Lucky Dice Experiment) 15 

Class Time: 
Names: 

5.12.1 Student Learning Outcomes: 

• The student will compare empirical data and a theoretical distribution to determine if a Tet gambling 
game fits a discrete distribution. 

• The student will demonstrate an understanding of long-term probabilities. 



5.12.2 Supplies: 

• 1 game "Lucky Dice" or 3 regular dice 

NOTE: For a detailed game description, refer here. (The link goes to the beginning of Discrete 
Random Variables Homework. Please refer to Problem #14.) 

NOTE: Round relative frequencies and probabilities to four decimal places. 

5.12.3 The Procedure 

1. The experiment procedure is to bet on one object. Then, roll 3 Lucky Dice and count the number of 
matches. The number of matches will decide your profit. 

2. What is the theoretical probability of 1 die matching the object? 

3. Choose one object to place a bet on. Roll the 3 Lucky Dice. Count the number of matches. 

4. Let X = number of matches. Theoretically, X ~ B( , ) 

5. Let Y = profit per game. 



5.12.4 Organize the Data 

In the chart below, fill in the Y value that corresponds to each X value. Next, record the number of matches 
picked for your class. Then, calculate the relative frequency. 

1. Complete the table. 



X 


y 


Frequency 


Relative Frequency 











1 








2 








3 









Table 5.19 



5 This content is available online at <http://cnx.Org/content/ml6826/l.12/>. 
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2. Calculate the Following: 

a. x = 

b. s x = 

c.y = 
d. s y = 

3. Explain what x represents. 

4. Explain what y represents. 

5. Based upon the experiment: 

a. What was the average profit per game? 

b. Did this represent an average win or loss per game? 

c. How do you know? Answer in complete sentences. 

6. Construct a histogram of the empirical data 



Relative Frequency 



Number of Matches 



Figure 5.4 



5.12.5 Theoretical Distribution 

Build the theoretical PDF chart for X and Y based on the distribution from the section titled "The Procedure" 

1. 



X 


y 


P(x) = P(y) 













1 








2 








3 
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Table 5.20 



2. Calculate the following 

a. ]i x = 

b. cr x = 

c. ]ly = 

3. Explain what j,i x represents. 

4. Explain what \iy represents. 

5. Based upon theory: 

a. What was the expected profit per game? 

b. Did the expected profit represent an average win or loss per game? 

c. How do you know? Answer in complete sentences. 

6. Construct a histogram of the theoretical distribution. 



Probability 



Number of Matches 



Figure 5.5 



5.12.6 Use the Data 

Calculate the following (rounded to 4 decimal places): 

NOTE: RF = relative frequency 
Use the data from the section titled "Theoretical Distribution" here: 

1. P(x = 3) = 



2. P(0 < x <3) 

3. P(x >2) =_ 
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Use the data from the section titled "Organize the Data" here: 

1. RF (x = 3) = 



2. RF (0 < x < 3) =_ 

3. RF (x>2)= 



5.12.7 Discussion Question 

For questions 1. and 2., consider the graphs, the probabilities and relative frequencies, the means and the 
standard deviations. 

1. Knowing that data vary, describe three similarities between the graphs and distributions of the theo- 
retical and empirical distributions. Use complete sentences. (Note: these answers may vary and still 
be correct.) 

2. Describe the three most significant differences between the graphs or distributions of the theoretical 
and empirical distributions. (Note: these answers may vary and still be correct.) 

3. Thinking about your answers to 1. and 2., does it appear that the data fit the theoretical distribution? 
In 1 - 3 complete sentences, explain why or why not. 

4. Suppose that the experiment had been repeated 500 times. Which table (from "Organize the Data" or 
"Theoretical Distribution") would you expect to change? Why? How might the table change? 
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Solutions to Exercises in Chapter 5 

Solution to Example 5.2, Problem 1 (p. 205) 

Let X = the number of days Nancy attends class per week. 

Solution to Example 5.2, Problem 2 (p. 205) 

0, 1, 2, and 3 

Solution to Example 5.2, Problem 3 (p. 205) 



Solution to Example 5.5, Problem 1 (p. 207) 
X = amount of profit 
Solution to Example 5.5, Problem 2 (p. 207) 



X 


P(x) 





0.01 


1 


0.04 


2 


0.15 


3 


0.80 



Table 5.21 





X 


P(x) 


xP (x) 


WIN 


10 


l 

3 


10 

3 


LOSE 


-6 


2 
3 


-12 
3 



Table 5.22 



Solution to Example 5.5, Problem 3 (p. 208) 
Add the last column of the table. The expected value \i 
time you play the game so you do not come out ahead. 
Solution to Example 5.9, Problem 1 (p. 209) 
failure 

Solution to Example 5.9, Problem 2 (p. 209) 

X = the number of statistics students who do their homework on time 
Solution to Example 5.9, Problem 3 (p. 209) 
0, 1, 2, . . ., 50 

Solution to Example 5.9, Problem 4 (p. 209) 

Failure is a student who does not do his or her homework on time. 
Solution to Example 5.9, Problem 5 (p. 209) 
q = 0.30 

Solution to Example 5.9, Problem 6 (p. 210) 
greater than or equal to (>) 



-j- . You lose, on average, about 67 cents each 
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Solutions to Practice 2: Binomial Distribution 

Solution to Exercise 5.8.1 (p. 215) 
X= the number that reply "yes" 
Solution to Exercise 5.8.2 (p. 215) 
6(8,0.713) 

Solution to Exercise 5.8.3 (p. 215) 
0,1,2,3,4,5,6,7,8 

Solution to Exercise 5.8.5 (p. 215) 
5.7 

Solution to Exercise 5.8.6 (p. 215) 
1.28 

Solution to Exercise 5.8.7 (p. 216) 
0.4151 

Solution to Exercise 5.8.8 (p. 216) 
0.9990 

Solutions to Homework 
Solution to Exercise 5.9.1 (p. 217) 

a. 0.1 

b. 1.6 

Solution to Exercise 5.9.3 (p. 217) 

b. $200,000;$600,000;$400,000 

c. third investment 

d. first investment 

e. second investment 

Solution to Exercise 5.9.5 (p. 218) 

a. 0.2 

c. 2.35 

d. 2-3 children 

Solution to Exercise 5.9.7 (p. 219) 

a. X = the number of dice that show a 1 

b. 0,1,2,3,4,5,6 
c X~B(6,|) 

d. 1 

e. 0.00002 

f. 3 dice 

Solution to Exercise 5.9.9 (p. 219) 

a. X = the number of students that will attend Tet. 

b. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 

c. X~B(12,0.18) 

d. 2.16 

e. 0.9511 

f. 0.3702 

Solution to Exercise 5.9.11 (p. 219) 
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a. X = the number of fencers that do not use foil as their main weapon 

b. 0, 1, 2, 3,... 25 

c. X~B(25,0.40) 

d. 10 

e. 0.0442 

f. Yes 

Solution to Exercise 5.9.13 (p. 220) 

a. X = the number of fortune cookies that have an extra fortune 

b. 0, 1, 2, 3,... 144 

c. X~B(144, 0.03) or P(4.32) 

d. 4.32 

e. 0.0124 or 0.0133 

f. 0.6300 or 0.6264 

Solution to Exercise 5.9.15 (p. 220) 

a. X = the number of women that suffer from anorexia 

b. 0, 1, 2, 3,... 600 (can leave off 600) 

c. X~P(3) 

d. 3 

e. 0.0498 

f. 0.1847 

Solution to Exercise 5.9.17 (p. 221) 

a. X = the number of children for a Spanish woman 

b. 0, 1, 2, 3,... 

c. X~P(1.47) 

d. 0.2299 

e. 0.5679 

f. 0.4321 

Solution to Exercise 5.9.19 (p. 221) 

a. X = the number of dealers she calls until she finds one with a used red Miata 

b. 1,2,3,... 

c. X~G(0.28) 

d. 3.57 

e. 0.7313 

f. 0.2497 

Solution to Exercise 5.9.21 (p. 222) 

d. 4.31 

e. 0.4079 

f. 0.0163 

Solution to Exercise 5.9.23 (p. 222) 

d. 2 

e. 0.1353 

f. 0.3233 

Solution to Exercise 5.9.25 (p. 222) 
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a. X = the number of seniors that participated in after-school sports all 4 years of high school 

b. 0, 1, 2, 3,... 60 

c. X~P(4.8) 

d. 4.8 

e. Yes 

f. 4 

Solution to Exercise 5.9.27 (p. 223) 

a. X = the number of shell pieces in one cake 

b. 0, 1, 2, 3,... 

c. X~P(1.5) 

d. 1.5 

e. 0.2231 

f. 0.0001 

g. Yes 

Solution to Exercise 5.9.29 (p. 223) 

d. 0.0043 

e. none 

f. 3.3 

Solution to Exercise 5.9.31 (p. 223) 

d. 3.02 

e. No 

f. 0.9997 
h. 0.3881 

i. 6.6207 pages 

Solution to Exercise 5.9.33 (p. 224) 
D: 4.43 
Solution to Exercise 5.9.34 (p. 224) 

A: 0.1476 
Solution to Exercise 5.9.35 (p. 224) 

C: 0.4734 
Solution to Exercise 5.9.36 (p. 224) 

A: The number of times Mrs. Plum's cats wake her up each week 
Solution to Exercise 5.9.37 (p. 225) 

D: 0.0671 
Solution to Exercise 5.9.38 (p. 225) 

Partial Answer: 

A: X = the number of DVDs a Video to Go customer rents 
B:0.12 
C:0.11 
D: 0.77 
Solution to Exercise 5.9.39 (p. 225) 

The variable of interest is X = net gain or loss, in dollars 

The face cards J, Q, K (Jack, Queen, King). There are(3)(4) = 12 face cards and 52 - 12 = 40 cards that are not 
face cards. 

We first need to construct the probability distribution for X. We use the card and coin events to determine 
the probability for each outcome, but we use the monetary value of X to determine the expected value. 
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Card Event 


$X net gain or loss 


P(X) 


Face Card and Heads 


6 


(12/52)(l/2)=6/52 


Face Card and Tails 


2 


(12/52)(l/2)=6/52 


(Not Face Card) and (H or T) 


-2 


(40/52)(l)= 40/52 



Table 5.23 



• Expected value = (6)(6/52) + (2)(6/52) + (-2) (40/52) = -32/52 

• Expected value = -$0.62, rounded to the nearest cent 

• If you play this game repeatedly over a long number of games, you would expect to lost 62 cents per 
game, on average. 

• You should not play this game to win money because the expected value indicates an expected aver- 
age loss. 

Solution to Exercise 5.9.40 (p. 226) 

Start by writing the probability distribution. X is net gain or loss = prize (if any) less $10 cost of ticket 



X = $ net gain or loss 


P(X) 


$500-$10=$490 


1/100 


$100-$10=$90 


2/100 


$25-$10=$15 


4/100 


$0-$10=$-10 


93/100) 



Table 5.24 



Expected Value = (490)(1/100) + (90)(2/100) + (15)(4/100) + (-10) (93/100) = -$2. There is an expected loss 
of $2 per ticket, on average. 
Solution to Exercise 5.9.41 (p. 226) 

• X = number of questions answered correctly 

• X-B(10,0.5) 

• We are interested in AT LEAST 70% of 10 questions correct. 70% of 10 is 7. We want to find the 
probability that X is greater than or equal to 7. The event "at least 7" is the complement of "less than 
or equal to 6". 

• Using your calculator's distribution menu: 1 -binomcdf(10, .5, 6) gives 0.171875 

• The probability of getting at least 70% of the 10 questions correct when randomly guessing is approx- 
imately 0.172 

Solution to Exercise 5.9.42 (p. 226) 

• X = number of questions answered correctly 

• X-B(32, 1/3) 

• We are interested in MORE THAN 75% of 32 questions correct. 75% of 32 is 24. We want to find 
P(x>24). The event "more than 24" is the complement of "less than or equal to 24". 

• Using your calculator's distribution menu: 1 - binomcdf(32, 1/3, 24) 

• P(x>24) = 0.00000026761 

• The probability of getting more than 75% of the 32 questions correct when randomly guessing is very 
small and practically zero. 
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Solution to Exercise 5.9.43 (p. 226) 

A: X can take on the values 1, 2, 3, .... p = 2/6, q = 4/6 
B: 0.2222 
C:3 

Solutions to Review 

Solution to Exercise 5.10.1 (p. 227) 
C 

Solution to Exercise 5.10.2 (p. 227) 
B 

Solution to Exercise 5.10.3 (p. 227) 
A 

Solution to Exercise 5.10.4 (p. 228) 
0.5773 

Solution to Exercise 5.10.5 (p. 228) 
0.0522 

Solution to Exercise 5.10.6 (p. 228) 
B 

Solution to Exercise 5.10.7 (p. 228) 
C 

Solution to Exercise 5.10.8 (p. 228) 
C 

Solution to Exercise 5.10.9 (p. 229) 
0.2709 

Solution to Exercise 5.10.10 (p. 229) 
B 

Solution to Exercise 5.10.11 (p. 229) 

X = the number of patients calling in claiming to have the flu, who actually have the flu. X = 0, 1, 2, ...25 
Solution to Exercise 5.10.12 (p. 229) 
B (25,0.04) 

Solution to Exercise 5.10.13 (p. 229) 
0.0165 

Solution to Exercise 5.10.14 (p. 229) 
1 

Solution to Exercise 5.10.15 (p. 229) 
C 

Solution to Exercise 5.10.16 (p. 229) 
All words used by Tom Clancy in his novels 
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Chapter 6 

The Normal Distribution 



6.1 The Normal Distribution 1 

6.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Recognize the normal probability distribution and apply it appropriately. 

• Recognize the standard normal probability distribution and apply it appropriately. 

• Compare normal probabilities by converting to the standard normal distribution. 

6.1.2 Introduction 

The normal, a continuous distribution, is the most important of all the distributions. It is widely used 
and even more widely abused. Its graph is bell-shaped. You see the bell curve in almost all disciplines. 
Some of these include psychology, business, economics, the sciences, nursing, and, of course, mathematics. 
Some of your instructors may use the normal distribution to help determine your grade. Most IQ scores are 
normally distributed. Often real estate prices fit a normal distribution. The normal distribution is extremely 
important but it cannot be applied to everything in the real world. 

In this chapter, you will study the normal distribution, the standard normal, and applications associated 
with them. 

6.1.3 Optional Collaborative Classroom Activity 

Your instructor will record the heights of both men and women in your class, separately. Draw histograms 
of your data. Then draw a smooth curve through each histogram. Is each curve somewhat bell-shaped? Do 
you think that if you had recorded 200 data values for men and 200 for women that the curves would look 
bell-shaped? Calculate the mean for each data set. Write the means on the x-axis of the appropriate graph 
below the peak. Shade the approximate area that represents the probability that one randomly chosen 
male is taller than 72 inches. Shade the approximate area that represents the probability that one randomly 
chosen female is shorter than 60 inches. If the total area under each curve is one, does either probability 
appear to be more than 0.5? 



^his content is available online at <http://cnx.Org/content/ml6979/l.12/>. 
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The normal distribution has two parameters (two numerical descriptive measures), the mean (}i) and the 
standard deviation (a). If X is a quantity to be measured that has a normal distribution with mean (}i) and 
the standard deviation (a), we designate this by writing 



NORMAL:X~N (ji, a) 




The probability density function is a rather complicated function. Do not memorize it. It is not necessary. 



/(*) 



cr-s/l-n 



HW 



The cumulative distribution function is P (X < x) . It is calculated either by a calculator or a computer or 
it is looked up in a table. Technology has made the tables basically obsolete. For that reason, as well as 
the fact that there are various table formats, we are not including table instructions in this chapter. See the 
NOTE in this chapter in Calculation of Probabilities. 

The curve is symmetrical about a vertical line drawn through the mean, ja. In theory, the mean is the same 
as the median since the graph is symmetric about \i. As the notation indicates, the normal distribution 
depends only on the mean and the standard deviation. Since the area under the curve must equal one, a 
change in the standard deviation, <x, causes a change in the shape of the curve; the curve becomes fatter or 
skinnier depending on c. A change in \i causes the graph to shift to the left or right. This means there are an 
infinite number of normal probability distributions. One of special interest is called the standard normal 
distribution. 

6.2 The Standard Normal Distribution 2 

The standard normal distribution is a normal distribution of standardized values called z-scores. A z- 
score is measured in units of the standard deviation. For example, if the mean of a normal distribution is 
5 and the standard deviation is 2, the value 11 is 3 standard deviations above (or to the right of) the mean. 
The calculation is: 



x = y. + (z)cr = 5 + (3) (2) = 11 (6.1) 

The z-score is 3. 

The mean for the standard normal distribution is and the standard deviation is 1. The transformation 



X-fl 



produces the distribution Z~ N (0,1) . The value x comes from a normal distribution with 



mean ]i and standard deviation a. 

2 This content is available online at <http://cnx.org/content/ml6986/1.7/>. 
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6.3 Z-scores 3 

If X is a normally distributed random variable and X^N (pi, a), then the z-score is: 

z = ^ (6.2) 

a 

The z-score tells you how many standard deviations that the value x is above (to the right of) or below 
(to the left of) the mean, pi. Values of x that are larger than the mean have positive z-scores and values of x 
that are smaller than the mean have negative z-scores. If x equals the mean, then x has a z-score of 0. 

Example 6.1 

Suppose X ~ N (5, 6). This says that X is a normally distributed random variable with mean 
pi = 5 and standard deviation a = 6. Suppose x = 17. Then: 

x-pi 17-5 „ „„, 

z = — -f- = —— = 2 (6.3 

a 6 

This means that x = 17 is 2 standard deviations (2a) above or to the right of the mean pi = 5. 
The standard deviation is u = 6. 

Notice that: 

5 + 2-6 = 17 (The pattern is pi + za = x.) (6.4) 

Now suppose x — 1. Then: 

x — pi 1—5 

z = = = —0.67 (rounded to two decimal places) (6.5) 

u 6 

This means that x — 1 is 0.67 standard deviations (— 0.67c) below or to the left of the mean 
pi = 5. Notice that: 

5 + (—0.67) (6) is approximately equal to 1 (This has the pattern pi + (—0.67) a = 1 ) 

Summarizing, when z is positive, x is above or to the right of pi and when z is negative, x is to the 
left of or below pi. 

Example 6.2 

Some doctors believe that a person can lose 5 pounds, on the average, in a month by reducing 
his/her fat intake and by exercising consistently. Suppose weight loss has a normal distribution. 
Let X = the amount of weight lost (in pounds) by a person in a month. Use a standard deviation 
of 2 pounds. X^N (5, 2). Fill in the blanks. 

Problem 1 (Solution on p. 269.) 

Suppose a person lost 10 pounds in a month. The z-score when x — 10 pounds is z = 2.5 

(verify). This z-score tells you that x = 10 is standard deviations to the (right 

or left) of the mean (What is the mean?). 

Problem 2 (Solution on p. 269.) 

Suppose a person gained 3 pounds (a negative weight loss). Then z = . This z-score 

tells you that x = —3 is standard deviations to the ( rl ght or left) of the mean. 

Suppose the random variables X and Y have the following normal distributions: X ~N (5, 6) and 
Y ~ N (2, 1). If x = 17, then z = 2. (This was previously shown.) If y = 4, what is z? 

V - u 4-2 
z = = — - — = 2 where pi=2 and c=l. (6.6) 



3 This content is available online at <http://cnx.org/content/ml6991/1.9/>. 
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The z-score for y = 4 is z = 2. This means that 4 is z = 2 standard deviations to the right of 
the mean. Therefore, x = 17 and y = 4 are both 2 (of their) standard deviations to the right of 
their respective means. 

The z-score allows us to compare data that are scaled differently. To understand the concept, 
suppose X ~N (5, 6) represents weight gains for one group of people who are trying to gain 
weight in a 6 week period and Y ~N (2, 1) measures the same weight gain for a second group 
of people. A negative weight gain would be a weight loss. Since X — 17 and y — 4 are each 2 
standard deviations to the right of their means, they represent the same weight gain relative to 
their means. 

The Empirical Rule 

If X is a random variable and has a normal distribution with mean \i and standard deviation a then the 
Empirical Rule says (See the figure below) 

• About 68.27% of the X values lie between -la and +la of the mean }i (within 1 standard deviation of 
the mean). 

• About 95.45% of the x values lie between -2a and +2a of the mean \i (within 2 standard deviations of 
the mean). 

• About 99.73% of the X values lie between -3a and +3a of the mean \i (within 3 standard deviations of 
the mean). Notice that almost all the x values lie within 3 standard deviations of the mean. 

• The z-scores for +1(7 and -la are +1 and -1, respectively. 

• The z-scores for +2a and -la are +2 and -2, respectively. 

• The z-scores for +3a and -3a are +3 and -3 respectively. 




-3a— 2a — la |i la 2a 3a 



Example 6.3 

Suppose X has a normal distribution with mean 50 and standard deviation 6. 

• About 68.27% of the x values lie between -la = (-1)(6) = -6 and la = (1)(6) = 6. The values -6 
and 6 are within 1 standard deviation of the mean 50. The z-scores are -1 and +1 for -6 and 
6, respectively. 

• About 95.45% of the x values lie between -2a = (-2)(6) = -12 and 2a = (2)(6) = 12. The values 
-12 and 12 are within 2 standard deviations of the mean 50. The z-scores are -2 and +2 for -12 
and 12, respectively. 

• About 99.73% of the x values lie between -3a = (-3)(6) = -18 and 3a = (3)(6) = 18. The values 
-18 and 18 are within 3 standard deviations of the mean 50. The z-scores are -3 and +3 for -18 
and 18, respectively. 
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6.4 Areas to the Left and Right of x 4 

The arrow in the graph below points to the area to the left of x. This area is represented by the probability 
P (X < x). Normal tables, computers, and calculators provide or calculate the probability P (X < x). 



P(X < x) 




X 
x 

The area to the right is then P (X > x) = I - P (X < x). 

Remember, P (X < x) — Area to the left of the vertical line through x. 

P (X > x) = 1 — P (X < x) =. Area to the right of the vertical line through x 

P (X < x) is the same as P (X < x) and P (X > x) is the same as P (X > x) for continuous distributions. 

6.5 Calculations of Probabilities 5 

Probabilities are calculated by using technology. There are instructions in the chapter for the TI-83+ and 
TI-84 calculators. 

NOTE: In the Table of Contents for Collaborative Statistics, entry 15. Tables has a link to a table 
of normal probabilities. Use the probability tables if so desired, instead of a calculator. The tables 
include instructions for how to use then. 

Example 6.4 

If the area to the left is 0.0228, then the area to the right is 1 - 0.0228 = 0.9772. 

Example 6.5 

The final exam scores in a statistics class were normally distributed with a mean of 63 and a 
standard deviation of 5. 

Problem 1 

Find the probability that a randomly selected student scored more than 65 on the exam. 

Solution 

Let X = a score on the final exam. X<~N (63, 5), where pi = 63 and a = 5 

Draw a graph. 

Then, find P (x > 65). 

P (x > 65) = 0.3446 (calculator or computer) 



4 This content is available online at <http://cnx.org/content/ml6976/1.5/>. 
'This content is available online at <http://cnx.Org/content/ml6977/l.12/>. 
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0.3446 




63 65 
The probability that one student scores more than 65 is 0.3446. 

Using the TI-83+ or the TI-84 calculators, the calculation is as follows. Go into 2nd DISTR. 

After pressing 2nd DISTR, press 2 :normalcdf . 

The syntax for the instructions are shown below. 

normalcdf(lower value, upper value, mean, standard deviation) For this problem: normal- 
cdf(65,lE99,63,5) = 0.3446. You get 1E99 ( = 10 99 ) by pressing 1, the EE key (a 2nd key) and then 99. 
Or, you can enter 10~99 instead. The number 10 is way out in the right tail of the normal curve. 
We are calculating the area between 65 and 10 . In some instances, the lower number of the area 
might be -1E99 ( = — 10 ). The number —10 is way out in the left tail of the normal curve. 

HISTORICAL NOTE: The TI probability program calculates a z-score and then the probability from 
the z-score. Before technology, the z-score was looked up in a standard normal probability table 
(because the math involved is too cumbersome) to find the probability. In this example, a standard 
normal table with area to the left of the z-score was used. You calculate the z-score and look up 
the area to the left. The probability is the area to the right. 



65-63 



0.4 . Area to the left is 0.6554. P (x > 65) = P (z > 0.4) = 1 - 0.6554 = 0.3446 



Problem 2 

Find the probability that a randomly selected student scored less than 85. 

Solution 

Draw a graph. 

Then find P (x < 85). Shade the graph. P (x < 85) = 1 (calculator or computer) 
The probability that one student scores less than 85 is approximately 1 (or 100%). 
The Tl-instructions and answer are as follows: 
normalcdf (0,85,63,5) = 1 (rounds to 1) 



Problem 3 

Find the 90th percentile (that is, find the score k that has 90 % of the scores below k and 10% of 
the scores above k). 

Solution 

Find the 90th percentile. For each problem or part of a problem, draw a new graph. Draw the 
x-axis. Shade the area that corresponds to the 90th percentile. 

Let k = the 90th percentile, k is located on the x-axis. P (x < k) is the area to the left of k. The 90th 
percentile k separates the exam scores into those that are the same or lower than k and those that 
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are the same or higher. Ninety percent of the test scores are the same or lower than k and 10% are 
the same or higher, k is often called a critical value. 

k — 69 A (calculator or computer) 



P(x < k) = 0.90 




The 90th percentile is 69.4. This means that 90% of the test scores fall at or below 69.4 and 10% fall 
at or above. For the TI-83+ or TT84 calculators, use invNorm in 2nd DISTR. invNorm(area to the 
left, mean, standard deviation) For this problem, invNorm(0. 90,63,5) = 69.4 



Problem 4 

Find the 70th percentile (that is, find the score k such that 70% of scores are below k and 30% of 
the scores are above k). 

Solution 

Find the 70th percentile. 

Draw a new graph and label it appropriately, k — 65.6 

The 70th percentile is 65.6. This means that 70% of the test scores fall at or below 65.5 and 30% fall 
at or above. 

invNorm(0.70,63,5) = 65.6 



Example 6.6 

A computer is used for office work at home, research, communication, personal finances, educa- 
tion, entertainment, social networking and a myriad of other things. Suppose that the average 
number of hours a household personal computer is used for entertainment is 2 hours per day. 
Assume the times for entertainment are normally distributed and the standard deviation for the 
times is half an hour. 

Problem 1 

Find the probability that a household personal computer is used between 1.8 and 2.75 hours per 
day. 

Solution 

Let X = the amount of time (in hours) a household personal computer is used for entertainment. 
x~N (2,0.5) where y. = 2 and a = 0.5. 

Find P (1.8 < x < 2.75). 

The probability for which you are looking is the area between x — 1.8 and x = 

2.75. P (1.8 < x < 2.75) = 0.5886 
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1.8 2 2 - 75 x 

normalcdf(1.8,2.75,2,0.5) = 0.5886 

The probability that a household personal computer is used between 1.8 and 2.75 hours per day 
for entertainment is 0.5886. 



Problem 2 

Find the maximum number of hours per day that the bottom quartile of households use a personal 

computer for entertainment. 

Solution 

To find the maximum number of hours per day that the bottom quartile of households uses a 
personal computer for entertainment, find the 25th percentile, k, where P (x < k) = 0.25. 



k = 1.67 



P(s < k) = 0.25 




P(i > k) = 0.75 



invNorm(0.25,2,.5) = 1.66 

The maximum number of hours per day that the bottom quartile of households uses a personal 
computer for entertainment is 1.66 hours. 
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6.6 Summary of Formulas 6 

Formula 6.1: Normal Probability Distribution 
X~N(^,cr) 

H = the mean a = the standard deviation 

Formula 6.2: Standard Normal Probability Distribution 

Z~N(0,1) 

z = a standardized value (z-score) 

mean = standard deviation = 1 

Formula 6.3: Finding the kth Percentile 

To find the kth percentile when the z-score is known: k = ji + (z) a 

Formula 6.4: z-score 

X— u 

Formula 6.5: Finding the area to the left 
The area to the left: P (X < x) 

Formula 6.6: Finding the area to the right 

The area to the right: P (X > x) = 1 - P (X < x) 



6 This content is available online at <http://cnx.org/content/ml6987/1.5/>. 
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6.7 Practice: The Normal Distribution 7 

6.7.1 Student Learning Outcomes 

• The student will analyze data following a normal distribution. 

6.7.2 Given 

The life of Sunshine CD players is normally distributed with a mean of 4.1 years and a standard deviation 
of 1.3 years. A CD player is guaranteed for 3 years. We are interested in the length of time a CD player 
lasts. 

6.7.3 Normal Distribution 

Exercise 6.7.1 

Define the Random Variable X in words. X = 

Exercise 6.7.2 
X~ 

Exercise 6.7.3 (Solution on p. 269.) 

Find the probability that a CD player will break down during the guarantee period. 

a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the probabil- 
ity 




Figure 6.1 



b. P (0 < x < . 



) = (Use zero (0) for the minimum value of x.) 

Exercise 6.7.4 (Solution on p. 269.) 

Find the probability that a CD player will last between 2.8 and 6 years. 

a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the probabil- 
ity. 



7 This content is available online at <http://cnx.Org/content/ml6983/l.10/>. 
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Figure 6.2 



b. P(_ 



< x < 



Exercise 6.7.5 

Find the 70th percentile of the distribution for the time a CD player lasts. 



(Solution on p. 269.) 



a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the lower 

70%. 




Figure 6.3 



b. P (x < k) = 



Therefore, k — 
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6.8 Homework 8 

Exercise 6.8.1 (Solution on p. 269.) 

According to a study done by De Anza students, the height for Asian adult males is normally 
distributed with an average of 66 inches and a standard deviation of 2.5 inches. Suppose one 
Asian adult male is randomly chosen. Let X =height of the individual. 

a. X- ( , ) 

b. Find the probability that the person is between 65 and 69 inches. Include a sketch of the graph 

and write a probability statement. 

c. Would you expect to meet many Asian adult males over 72 inches? Explain why or why not, 

and justify your answer numerically. 

d. The middle 40% of heights fall between what two values? Sketch the graph and write the 

probability statement. 

Exercise 6.8.2 

IQ is normally distributed with a mean of 100 and a standard deviation of 15. Suppose one 
individual is randomly chosen. Let X =IQ of an individual. 

a. X- ( , ) 

b. Find the probability that the person has an IQ greater than 120. Include a sketch of the graph 

and write a probability statement. 

c. Mensa is an organization whose members have the top 2% of all IQs. Find the minimum IQ 

needed to qualify for the Mensa organization. Sketch the graph and write the probability 
statement. 

d. The middle 50% of IQs fall between what two values? Sketch the graph and write the proba- 

bility statement. 

Exercise 6.8.3 (Solution on p. 269.) 

The percent of fat calories that a person in America consumes each day is normally distributed 
with a mean of about 36 and a standard deviation of 10. Suppose that one individual is randomly 
chosen. Let X =percent of fat calories. 

a. X- ( , ) 

b. Find the probability that the percent of fat calories a person consumes is more than 40. Graph 

the situation. Shade in the area to be determined. 

c. Find the maximum number for the lower quarter of percent of fat calories. Sketch the graph 

and write the probability statement. 

Exercise 6.8.4 

Suppose that the distance of fly balls hit to the outfield (in baseball) is normally distributed with 
a mean of 250 feet and a standard deviation of 50 feet. 

a. If X = distance in feet for a fly ball, then X~ ( , ) 

b. If one fly ball is randomly chosen from this distribution, what is the probability that this ball 

traveled fewer than 220 feet? Sketch the graph. Scale the horizontal axis X. Shade the region 
corresponding to the probability. Find the probability. 

c. Find the 80th percentile of the distribution of fly balls. Sketch the graph and write the probabil- 

ity statement. 



8 This content is available online at <http://cnx.Org/content/ml6978/l.20/>. 
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Exercise 6.8.5 (Solution on p. 269.) 

In China, 4-year-olds average 3 hours a day unsupervised. Most of the unsupervised children live 
in rural areas, considered safe. Suppose that the standard deviation is 1.5 hours and the amount 
of time spent alone is normally distributed. We randomly survey one Chinese 4-year-old living in 
a rural area. We are interested in the amount of time the child spends alone per day. (Source: San 
Jose Mercury News) 

a. In words, define the random variable X. X = 

b. X~ 

c. Find the probability that the child spends less than 1 hour per day unsupervised. Sketch the 

graph and write the probability statement. 

d. What percent of the children spend over 10 hours per day unsupervised? 

e. 70% of the children spend at least how long per day unsupervised? 

Exercise 6.8.6 

In the 1992 presidential election, Alaska's 40 election districts averaged 1956.8 votes per district 
for President Clinton. The standard deviation was 572.3. (There are only 40 election districts in 
Alaska.) The distribution of the votes per district for President Clinton was bell-shaped. Let X = 
number of votes for President Clinton for an election district. (Source: The World Almanac and 
Book of Facts) 

a. State the approximate distribution of X. X~ 

b. Is 1956.8 a population mean or a sample mean? How do you know? 

c. Find the probability that a randomly selected district had fewer than 1600 votes for President 

Clinton. Sketch the graph and write the probability statement. 

d. Find the probability that a randomly selected district had between 1800 and 2000 votes for 

President Clinton. 

e. Find the third quartile for votes for President Clinton. 

Exercise 6.8.7 (Solution on p. 269.) 

Suppose that the duration of a particular type of criminal trial is known to be normally distributed 
with a mean of 21 days and a standard deviation of 7 days. 

a. In words, define the random variable X. X = 

b. X~ 

c. If one of the trials is randomly chosen, find the probability that it lasted at least 24 days. Sketch 

the graph and write the probability statement. 

d. 60% of all of these types of trials are completed within how many days? 

Exercise 6.8.8 

Terri Vogel, an amateur motorcycle racer, averages 129.71 seconds per 2.5 mile lap (in a 7 lap 
race) with a standard deviation of 2.28 seconds . The distribution of her race times is normally 
distributed. We are interested in one of her randomly selected laps. (Source: log book of Terri 
Vogel) 

a. In words, define the random variable X. X = 

b. X~ 

c. Find the percent of her laps that are completed in less than 130 seconds. 

d. The fastest 3% of her laps are under . 

e. The middle 80% of her laps are from seconds to seconds. 
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Exercise 6.8.9 (Solution on p. 269.) 

Thuy Dau, Ngoc Bui, Sam Su, and Lan Voung conducted a survey as to how long customers at 
Lucky claimed to wait in the checkout line until their turn. Let X =time in line. Below are the 
ordered real data (in minutes): 



0.50 


4.25 


5 


6 


7.25 


1.75 


4.25 


5.25 


6 


7.25 


2 


4.25 


5.25 


6.25 


7.25 


2.25 


4.25 


5.5 


6.25 


7.75 


2.25 


4.5 


5.5 


6.5 


8 


2.5 


4.75 


5.5 


6.5 


8.25 


2.75 


4.75 


5.75 


6.5 


9.5 


3.25 


4.75 


5.75 


6.75 


9.5 


3.75 


5 


6 


6.75 


9.75 


3.75 


5 


6 


6.75 


10.75 



Table 6.1 



Calculate the sample mean and the sample standard deviation. 

Construct a histogram. Start the x — axis at —0.375 and make bar widths of 2 minutes. 

Draw a smooth curve through the midpoints of the tops of the bars. 

In words, describe the shape of your histogram and smooth curve. 

Let the sample mean approximate ]i and the sample standard deviation approximate C. The 

distribution of X can then be approximated by X~ 
Use the distribution in (e) to calculate the probability that a person will wait fewer than 6.1 

minutes. 
Determine the cumulative relative frequency for waiting less than 6.1 minutes. 
Why aren't the answers to (f) and (g) exactly the same? 
Why are the answers to (f) and (g) as close as they are? 
If only 10 customers were surveyed instead of 50, do you think the answers to (f) and (g) would 

have been closer together or farther apart? Explain your conclusion. 



Exercise 6.8.10 

Suppose that Ricardo and Anita attend different colleges. Ricardo's GPA is the same as the av- 
erage GPA at his school. Anita's GPA is 0.70 standard deviations above her school average. In 
complete sentences, explain why each of the following statements may be false. 

a. Ricardo's actual GPA is lower than Anita's actual GPA. 

b. Ricardo is not passing since his z-score is zero. 

c. Anita is in the 70th percentile of students at her college. 



Exercise 6.8.11 (Solution on p. 270.) 

Below is a sample of the maximum capacity (maximum number of spectators) of sports 
stadiums. The table does not include horse racing or motor racing stadiums. (Source: 
http://en.wikipedia.org/wiki/List_of_stadiums_by_capacity) 



259 



40,000 


40,000 


45,050 


45,500 


46,249 


48,134 


49,133 


50,071 


50,096 


50,466 


50,832 


51,100 


51,500 


51,900 


52,000 


52,132 


52,200 


52,530 


52,692 


53,864 


54,000 


55,000 


55,000 


55,000 


55,000 


55,000 


55,000 


55,082 


57,000 


58,008 


59,680 


60,000 


60,000 


60,492 


60,580 


62,380 


62,872 


64,035 


65,000 


65,050 


65,647 


66,000 


66,161 


67,428 


68,349 


68,976 


69,372 


70,107 


70,585 


71,594 


72,000 


72,922 


73,379 


74,500 


75,025 


76,212 


78,000 


80,000 


80,000 


82,300 



Table 6.2 



a. Calculate the sample mean and the sample standard deviation for the maximum capacity of 

sports stadiums (the data). 

b. Construct a histogram of the data. 

c. Draw a smooth curve through the midpoints of the tops of the bars of the histogram. 

d. In words, describe the shape of your histogram and smooth curve. 

e. Let the sample mean approximate \i and the sample standard deviation approximate a. The 

distribution of X can then be approximated by X^ 

f. Use the distribution in (e) to calculate the probability that the maximum capacity of sports 

stadiums is less than 67,000 spectators. 

g. Determine the cumulative relative frequency that the maximum capacity of sports stadiums is 

less than 67,000 spectators. Hint: Order the data and count the sports stadiums that have a 
maximum capacity less than 67,000. Divide by the total number of sports stadiums in the 
sample. 
h. Why aren't the answers to (f) and (g) exactly the same? 



6.8.1 Try These Multiple Choice Questions 

The questions below refer to the following: The patient recovery time from a particular surgical proce- 
dure is normally distributed with a mean of 5.3 days and a standard deviation of 2.1 days. 

Exercise 6.8.12 (Solution on p. 270.) 

What is the median recovery time? 

A. 2.7 

B. 5.3 

C. 7.4 

D. 2.1 



Exercise 6.8.13 

What is the z-score for a patient who takes 10 days to recover? 



(Solution on p. 270.) 



A. 1.5 

B. 0.2 
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C. 2.2 

D. 7.3 

Exercise 6.8.14 (Solution on p. 270.) 

What is the probability of spending more than 2 days in recovery? 

A. 0.0580 

B. 0.8447 

C. 0.0553 

D. 0.9420 

Exercise 6.8.15 (Solution on p. 270.) 

The 90th percentile for recovery times is? 

A. 8.89 

B. 7.07 

C. 7.99 

D. 4.32 

The questions below refer to the following: The length of time to find a parking space at 9 A.M. follows a 
normal distribution with a mean of 5 minutes and a standard deviation of 2 minutes. 

Exercise 6.8.16 (Solution on p. 270.) 

Based upon the above information and numerically justified, would you be surprised if it took 
less than 1 minute to find a parking space? 

A. Yes 

B. No 

C. Unable to determine 

Exercise 6.8.17 (Solution on p. 270.) 

Find the probability that it takes at least 8 minutes to find a parking space. 

A. 0.0001 

B. 0.9270 

C. 0.1862 

D. 0.0668 

Exercise 6.8.18 (Solution on p. 270.) 

Seventy percent of the time, it takes more than how many minutes to find a parking space? 

A. 1.24 

B. 2.41 

C. 3.95 

D. 6.05 

Exercise 6.8.19 (Solution on p. 270.) 

If the mean is significantly greater than the standard deviation, which of the following statements 
is true? 

I . The data cannot follow the uniform distribution. 

II . The data cannot follow the exponential distribution.. 
HI . The data cannot follow the normal distribution. 
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A. I only 

B. II only 

C. Ill only 

D. I, II, and III 
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The next two questions refer to: X ~ U (3, 13) 

Exercise 6.9.1 

Explain which of the following are false and which are true. 

a: f(x) = jq,3< x <13 

b: There is no mode. 

c: The median is less than the mean. 

d: P(x > 10) =P(x <6) 

Exercise 6.9.2 

Calculate: 

a: Mean 

b: Median 

c: 65th percentile. 



(Solution on p. 270.) 



(Solution on p. 270.) 



Exercise 6.9.3 

Which of the following is true for the above box plot? 

a: 25% of the data are at most 5. 

b: There is about the same amount of data from 4 - 5 as there is from 5-7. 

c: There are no data values of 3. 

d: 50% of the data are 4. 



(Solution on p. 270.) 



Exercise 6.9.4 (Solution on p. 270.) 

If P (G | H) = P (G), then which of the following is correct? 

A: G and H are mutually exclusive events. 

B: P(G) =P{H) 

C: Knowing that H has occurred will affect the chance that G will happen. 

D: G and H are independent events. 

Exercise 6.9.5 (Solution on p. 270.) 

If P (/) = 0.3, P (K) = 0.6, and / and K are independent events, then explain which are correct 
and which are incorrect. 

A: P (JandK) = 
B: P (JorK) = 0.9 
C: P (JorK) = 0.72 
D: P(J)^P(J\K) 



9 This content is available online at <http://cnx.Org/content/ml6985/l.10/>. 
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Exercise 6.9.6 (Solution on p. 271.) 

On average, 5 students from each high school class get full scholarships to 4-year colleges. Assume 
that most high school classes have about 500 students. 

X = the number of students from a high school class that get full scholarships to 4-year school. 
Which of the following is the distribution of X? 

A. P(5) 

B. B(500,5) 

C. Exp(l/5) 

D. N(5, (0.01)(0.99)/500) 
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6.10 Lab 1: Normal Distribution (Lap Times) 10 

Class Time: 
Names: 

6.10.1 Student Learning Outcome: 

• The student will compare and contrast empirical data and a theoretical distribution to determine if 
Terry Vogel's lap times fit a continuous distribution. 



6.10.2 Directions: 

Round the relative frequencies and probabilities to 4 decimal places. Carry all other decimal answers to 2 
places. 

6.10.3 Collect the Data 

1. Use the data from Terri Vogel's Log Book (Section 13.3.1: Lap Times). Use a Stratified Sampling 
Method by Lap (Races 1 - 20) and a random number generator to pick 6 lap times from each stratum. 
Record the lap times below for Laps 2-7. 



Table 6.3 

2. Construct a histogram. Make 5-6 intervals. Sketch the graph using a ruler and pencil. Scale the axes. 



"This content is available online at <http://cnx.Org/content/ml6981/l.18/>. 
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Frequency 



Lap Time 



Figure 6.4 



3. Calculate the following. 

a. x = 

b. s = 

4. Draw a smooth curve through the tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. (Keep it simple. Does the graph go straight across, does it 
have a V-shape, does it have a hump in the middle or at either end, etc.?) 

6.10.4 Analyze the Distribution 

Using your sample mean, sample standard deviation, and histogram to help, what was the approximate 
theoretical distribution of the data? 

• X~ 

• How does the histogram help you arrive at the approximate distribution? 



6.10.5 Describe the Data 

Use the Data from the section titled "Collect the Data" to complete the following statements. 



The IQR goes from 
IQR = 



to 



(IQR=Q3-Q1) 



The 15th percentile is: 

The 85th percentile is: 

The median is: 

The empirical probability that a randomly chosen lap time is more than 130 seconds = 

Explain the meaning of the 85th percentile of this data. 
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6.10.6 Theoretical Distribution 

Using the theoretical distribution from the section titled "Analyse the Distribution" complete the following 
statements: 

• The IQR goes from to . 

• IQR = 

• The 15th percentile is: 

• The 85th percentile is: 

• The median is: 

• The probability that a randomly chosen lap time is more than 130 seconds = 

• Explain the meaning of the 85th percentile of this distribution. 

6.10.7 Discussion Questions 

• Do the data from the section titled "Collect the Data" give a close approximation to the theoretical 
distibution in the section titled "Analyze the Distribution"? In complete sentences and comparing the 
result in the sections titled "Describe the Data" and "Theoretical Distribution", explain why or why 
not. 
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6.11 Lab 2: Normal Distribution (Pinkie Length) 

Class Time: 
Names: 

6.11.1 Student Learning Outcomes: 

• The student will compare empirical data and a theoretical distribution to determine if data from the 
experiment follow a continuous distribution. 



6.11.2 Collect the Data 

Measure the length of your pinkie finger (in cm.) 

1. Randomly survey 30 adults. Round to the nearest 0.5 cm. 



Table 6.4 

2. Construct a histogram. Make 5-6 intervals. Sketch the graph using a ruler and pencil. Scale the axes. 



Frequency 



Length of Finger 



3. Calculate the Following 



1 This content is available online at <http://cnx.Org/content/ml6980/l.16/>. 
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a. x = 

b. s = 

4. Draw a smooth curve through the top of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. (Keep it simple. Does the graph go straight across, does it 
have a V-shape, does it have a hump in the middle or at either end, etc.?) 



6.11.3 Analyze the Distribution 

Using your sample mean, sample standard deviation, and histogram to help, what was the approximate 
theoretical distribution of the data from the section titled "Collect the Data"? 

• X~ 

• How does the histogram help you arrive at the approximate distribution? 

6.11.4 Describe the Data 

Using the data in the section titled "Collect the Data" complete the following statements. (Hint: order the 
data) 

Remember: (IQR = Q3 - Ql) 

• IQR = 

• 15th percentile is: 

• 85th percentile is: 

• Median is: 

• What is the empirical probability that a randomly chosen pinkie length is more than 6.5 cm? 

• Explain the meaning the 85th percentile of this data. 



6.11.5 Theoretical Distribution 

Using the Theoretical Distribution in the section titled "Analyze the Distribution" 

• IQR = 

• 15th percentile is: 

• 85th percentile is: 

• Median is: 

• What is the theoretical probability that a randomly chosen pinkie length is more than 6.5 cm? 

• Explain the meaning of the 85th percentile of this data. 

6.11.6 Discussion Questions 

• Do the data from the section entitled "Collect the Data" give a close approximation to the theoretical 
distribution in "Analyze the Distribution." In complete sentences and comparing the results in the 
sections titled "Describe the Data" and "Theoretical Distribution", explain why or why not. 
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Solutions to Exercises in Chapter 6 

Solution to Example 6.2, Problem 1 (p. 247) 

This z-score tells you that x = 10 is 2.5 standard deviations to the right of the mean 5. 
Solution to Example 6.2, Problem 2 (p. 247) 
z = -4. This z-score tells you that x = —3 is 4 standard deviations to the left of the mean. 

Solutions to Practice: The Normal Distribution 

Solution to Exercise 6.7.3 (p. 254) 

b. 3,0.1979 

Solution to Exercise 6.7.4 (p. 254) 

b. 2.8,6,0.7694 

Solution to Exercise 6.7.5 (p. 255) 

b. 0.70,4.78years 

Solutions to Homework 
Solution to Exercise 6.8.1 (p. 256) 

a. N (66,2.5) 

b. 0.5404 

c. No 

d. Between 64.7 and 67.3 inches 

Solution to Exercise 6.8.3 (p. 256) 

a. N (36,10) 

b. 0.3446 

c. 29.3 

Solution to Exercise 6.8.5 (p. 257) 

a. the time (in hours) a 4-year-old in China spends unsupervised per day 

b. N(3,1.5) 

c. 0.0912 

d. 

e. 2.21 hours 

Solution to Exercise 6.8.7 (p. 257) 

a. The duration of a criminal trial 

b. N(21,7) 

c. 0.3341 

d. 22.77 

Solution to Exercise 6.8.9 (p. 258) 

a. The sample mean is 5.51 and the sample standard deviation is 2.15 

e. N (5.51,2.15) 

f. 0.6081 

g. 0.64 
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Solution to Exercise 6.8.11 (p. 258) 

a. The sample mean is 60,136.4 and the sample standard deviation is 10,468.1. 

e. N (60136.4,10468.1) 

f. 0.7440 

g. 0.7167 

Solution to Exercise 6.8.12 (p. 259) 

B 
Solution to Exercise 6.8.13 (p. 259) 

C 
Solution to Exercise 6.8.14 (p. 260) 

D 
Solution to Exercise 6.8.15 (p. 260) 

C 
Solution to Exercise 6.8.16 (p. 260) 

A 
Solution to Exercise 6.8.17 (p. 260) 

D 
Solution to Exercise 6.8.18 (p. 260) 

C 
Solution to Exercise 6.8.19 (p. 260) 

B 

Solutions to Review 

Solution to Exercise 6.9.1 (p. 262) 

a: True 

b: True 

c: False - the median and the mean are the same for this symmetric distribution 

d: True 

Solution to Exercise 6.9.2 (p. 262) 

a: 8 
b: 8 

c: P (x < k) = 0.65 = (k-3) * (±). k = 9.5 

Solution to Exercise 6.9.3 (p. 262) 

a: False - | of the data are at most 5 

b: True - each quartile has 25% of the data 

c: False - that is unknown 

d: False - 50% of the data are 4 or less 

Solution to Exercise 6.9.4 (p. 262) 

D 

Solution to Exercise 6.9.5 (p. 262) 

A: False - J and K are independent so they are not mutually exclusive which would imply dependency 

(meaning P(J and K) is not 0). 
B: False - see answer C. 
C: True - P(J or K) = P(J) + P(K) - P(J and K) = P(J) + P(K) - P(J)P(K) = 0.3 + 0.6 - (0.3)(0.6) = 0.72. Note that 

P(J and K) = P(J)P(K) because J and K are independent. 
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D: False - J and K are independent so P(J) = P(J I K). 

Solution to Exercise 6.9.6 (p. 263) 

A 
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Chapter 7 

The Central Limit Theorem 

7.1 The Central Limit Theorem 1 

7.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Recognize the Central Limit Theorem problems. 

• Classify continuous word problems by their distributions. 

• Apply and interpret the Central Limit Theorem for Means. 

• Apply and interpret the Central Limit Theorem for Sums. 

7.1.2 Introduction 

Why are we so concerned with means? Two reasons are that they give us a middle ground for comparison 
and they are easy to calculate. In this chapter, you will study means and the Central Limit Theorem. 

The Central Limit Theorem (CLT for short) is one of the most powerful and useful ideas in all of statistics. 
Both alternatives are concerned with drawing finite samples of size n from a population with a known 
mean, ji, and a known standard deviation, a. The first alternative says that if we collect samples of size 
n and n is "large enough," calculate each sample's mean, and create a histogram of those means, then the 
resulting histogram will tend to have an approximate normal bell shape. The second alternative says that 
if we again collect samples of size n that are "large enough," calculate the sum of each sample and create a 
histogram, then the resulting histogram will again tend to have a normal bell-shape. 

In either case, it does not matter what the distribution of the original population is, or whether you even 
need to know it. The important fact is that the sample means and the sums tend to follow the normal 
distribution. And, the rest you will learn in this chapter. 

The size of the sample, n, that is required in order to be to be 'large enough' depends on the original 
population from which the samples are drawn. If the original population is far from normal then more 
observations are needed for the sample means or the sample sums to be normal. Sampling is done with 
replacement. 

Optional Collaborative Classroom Activity 



lr rhis content is available online at <http://cnx.Org/content/ml6953/l.17/>. 
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Do the following example in class: Suppose 8 of you roll 1 fair die 10 times, 7 of you roll 2 fair dice 10 
times, 9 of you roll 5 fair dice 10 times, and 11 of you roll 10 fair dice 10 times. 

Each time a person rolls more than one die, he/she calculates the sample mean of the faces showing. For 
example, one person might roll 5 fair dice and get a 2, 2, 3, 4, 6 on one roll. 

The mean is 2+2+3+4+6 _ 3 4 j^ e 34 j s one mean w hen 5 fair dice are rolled. This same person would 
roll the 5 dice 9 more times and calculate 9 more means for a total of 10 means. 

Your instructor will pass out the dice to several people as described above. Roll your dice 10 times. For 
each roll, record the faces and find the mean. Round to the nearest 0.5. 

Your instructor (and possibly you) will produce one graph (it might be a histogram) for 1 die, one graph for 
2 dice, one graph for 5 dice, and one graph for 10 dice. Since the "mean" when you roll one die, is just the 
face on the die, what distribution do these means appear to be representing? 

Draw the graph for the means using 2 dice. Do the sample means show any kind of pattern? 

Draw the graph for the means using 5 dice. Do you see any pattern emerging? 

Finally, draw the graph for the means using 10 dice. Do you see any pattern to the graph? What can you 
conclude as you increase the number of dice? 

As the number of dice rolled increases from 1 to 2 to 5 to 10, the following is happening: 

1. The mean of the sample means remains approximately the same. 

2. The spread of the sample means (the standard deviation of the sample means) gets smaller. 

3. The graph appears steeper and thinner. 

You have just demonstrated the Central Limit Theorem (CLT). 

The Central Limit Theorem tells you that as you increase the number of dice, the sample means tend 
toward a normal distribution (the sampling distribution). 

7.2 The Central Limit Theorem for Sample Means (Averages) 2 

Suppose X is a random variable with a distribution that may be known or unknown (it can be any distri- 
bution). Using a subscript that matches the random variable, suppose: 

a. }ix = the mean of X 

b. <7x = the standard deviation of X 

If you draw random samples of size n, then as n increases, the random variable X which consists of sample 
means, tends to be normally distributed and 



X ~ N (^) 



The Central Limit Theorem for Sample Means says that if you keep drawing larger and larger samples 
(like rolling 1, 2, 5, and, finally, 10 dice) and calculating their means the sample means form their own 
normal distribution (the sampling distribution). The normal distribution has the same mean as the 
original distribution and a variance that equals the original variance divided by n, the sample size, n is the 
number of values that are averaged together not the number of times the experiment is done. 



2 This content is available online at <http://cnx.Org/content/ml6947/l.23/>. 
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To put it more formally, if you draw random samples of size n,the distribution of the random vari- 
able X, which consists of sample means, is called the sampling distribution of the mean. The sampling 
distribution of the mean approaches a normal distribution as n, the sample size, increases. 

The random variable X has a different z-score associated with it than the random variable X. x is the value 
of X in one sample. 



z = 



(es) 



(7.1) 



Hx is both the average of X and of X. 



t r "v 



< T X 



standard deviation of X and is called the standard error of the mean. 



Example 7.1 

An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size n = 25 
are drawn randomly from the population. 

Problem 1 

Find the probability that the sample mean is between 85 and 92. 

Solution 

Let X = one value from the original unknown population. The probability question asks you to 
find a probability for the sample mean. 

Let X = the mean of a sample of size 25. Since pix — 90, cr x — 15, and n = 25; 

then X ~ N (90, -^) 

Find P (85 < x < 92) Draw a graph. 

P (85 < x < 92) = 0.6997 

The probability that the sample mean is between 85 and 92 is 0.6997. 



P(S5 < I < 92) 




85 90 92 

TI-83 or 84: normal cdf (lower value, upper value, mean, standard error of the mean) 
The parameter list is abbreviated (lower value, upper value, }i, -?=) 

normal cdf (85, 92, 90, JL) = 0.6997 
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Problem 2 

Find the value that is 2 standard deviations above the expected value (it is 90) of the sample mean. 

Solution 

To find the value that is 2 standard deviations above the expected value 90, use the formula 



value = fi x + (#ofSTDEVs) f ^L 



value = 90 + 2 • jL = 96 



So, the value that is 2 standard deviations above the expected value is 96. 



Example 7.2 

The length of time, in hours, it takes an "over 40" group of people to play one soccer match is 
normally distributed with a mean of 2 hours and a standard deviation of 0.5 hours. A sample of 
size n = 50 is drawn randomly from the population. 

Problem 

Find the probability that the sample mean is between 1.8 hours and 2.3 hours. 

Solution 

Let X = the time, in hours, it takes to play one soccer match. 

The probability question asks you to find a probability for the sample mean time, in hours, it 
takes to play one soccer match. 

Let X = the mean time, in hours, it takes to play one soccer match. 

If ]ix — / °~x — / an d n = , then X ~ N( , ) 

by the Central Limit Theorem for Means. 

ji x =2,o- x = 0.5, n = 50, and X - N (l , 4j| 

Find P (1.8 < x < 2.3). Draw a graph. 

P (1.8 < x < 2.3) =0.9977 

normalcdf fl.8, 2.3,2, -^\ = 0.9977 

The probability that the mean time is between 1.8 hours and 2.3 hours is . 
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7.3 The Central Limit Theorem for Sums 3 

Suppose X is a random variable with a distribution that may be known or unknown (it can be any distri- 
bution) and suppose: 

a. fix — the mean of X 

b. ax — the standard deviation of X 

If you draw random samples of size n, then as n increases, the random variable EX which consists of sums 
tends to be normally distributed and 

EX ~ N (n ■ fix, y/n • &x) 

The Central Limit Theorem for Sums says that if you keep drawing larger and larger samples and taking 
their sums, the sums form their own normal distribution (the sampling distribution) which approaches a 
normal distribution as the sample size increases. The normal distribution has a mean equal to the original 
mean multiplied by the sample size and a standard deviation equal to the original standard deviation 
multiplied by the square root of the sample size. 

The random variable EX has the following z-score associated with it: 

a. Ex is one sum. 

b . 2 = Z*-"-1>x 

y/n-ffx 

a. n ■ fix = the mean of EX 

b. \fn ■ ax = standard deviation of EX 

Example 7.3 

An unknown distribution has a mean of 90 and a standard deviation of 15. A sample of size 80 is 
drawn randomly from the population. 

Problem 

a. Find the probability that the sum of the 80 values (or the total of the 80 values) is more than 

7500. 

b. Find the sum that is 1.5 standard deviations above the mean of the sums. 

Solution 

Let X = one value from the original unknown population. The probability question asks you to 
find a probability for the sum (or total of) 80 values. 

EX = the sum or total of 80 values. Since fix — 90, cr x = 15, and n = 80, then 
EX - N (80 • 90, V80 • 15") 



v mean of the sums = n ■ fi x = (80) (90) = 7200 

•. standard deviation of the sums = \/n ■ ax = v80 • 15 

• . sum of 80 values = Ex = 7500 



a: Find P (Ex > 7500) 



3 This content is available online at <http://cnx.Org/content/ml6948/l.16/>. 
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P (Zx > 7500) = 0.0127 




7200 7500 

normal cdf (lower value, upper value, mean of sums, stdev of sums) 
The parameter list is abbreviated (lower, upper, n ■ fix, \/w • <Jx) 
normalcdf (7500,1E99, 80 ■ 90, v 7 ^ ■ 15 = 0.0127 
Reminder: 1E99 = 10 99 . Press the EE key for E. 



b: Find Ix where z = 1.5: 

Ex = n ■ fi x + z ■ sjn ■ a x = (80)(90) + (1.5)(v / 80) (15) = 7401.2 

7.4 Using the Central Limit Theorem 4 

It is important for you to understand when to use the CLT. If you are being asked to find the probability of 
the mean, use the CLT for the mean. If you are being asked to find the probability of a sum or total, use the 
CLT for sums. This also applies to percentiles for means and sums. 

NOTE: If you are being asked to find the probability of an individual value, do not use the CLT. 
Use the distribution of its random variable. 



7.4.1 Examples of the Central Limit Theorem 
Law of Large Numbers 

The Law of Large Numbers says that if you take samples of larger and larger size from any population, 
then the mean x of the sample tends to get closer and closer to \i. From the Central Limit Theorem, we 
know that as n gets larger and larger, the sample means follow a normal distribution. The larger n gets, the 
smaller the standard deviation gets. (Remember that the standard deviation for X is -?= .) This means that 

the sample mean x must be close to the population mean \i. We can say that \i is the value that the sample 
means approach as n gets larger. The Central Limit Theorem illustrates the Law of Large Numbers. 

Central Limit Theorem for the Mean and Sum Examples 



4 This content is available online at <http://cnx.Org/content/ml6958/l.21/>. 
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Example 7.4 

A study involving stress is done on a college campus among the students. The stress scores follow 
a uniform distribution with the lowest stress score equal to 1 and the highest equal to 5. Using a 
sample of 75 students, find: 

1. The probability that the mean stress score for the 75 students is less than 2. 

2. The 90th percentile for the mean stress score for the 75 students. 

3. The probability that the total of the 75 stress scores is less than 200. 

4. The 90th percentile for the total stress score for the 75 students. 

Let X = one stress score. 

Problems 1. and 2. ask you to find a probability or a percentile for a mean. Problems 3 and 4 ask 
you to find a probability or a percentile for a total or sum. The sample size, n, is equal to 75. 

Since the individual stress scores follow a uniform distribution, X ~ 11(1,5) where a = 1 and 
b — 5 (See Continuous Random Variables 5 for the uniform). 

Fx = a ~t = ¥ = 3 



cr x = V '^# = v^# = 1.15 

For problems 1. and 2., let X = the mean stress score for the 75 students. Then, 

X ~ N (3, ±£l\ where n = 75. 

Problem 1 

Find P (x < 2). Draw the graph. 

Solution 

P (x < 2) = 

The probability that the mean stress score is less than 2 is about 0. 
P(l< 2) 




normalcdf f 1,2, 3,^ J =0 

REMINDER: The smallest stress score is 1. Therefore, the smallest mean for 75 stress scores is 1. 



"Continuous Random Variables: Introduction" <http://cnx.org/content/ml6808/latest/> 
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Problem 2 

Find the 90th percentile for the mean of 75 stress scores. Draw a graph. 

Solution 

Let k = the 90th precentile. 

Find k where P (x < k) = 0.90. 
k = 3.2 



p( X < k)= 0,90 




The 90th percentile for the mean of 75 scores is about 3.2. This tells us that 90% of all the means of 
75 stress scores are at most 3.2 and 10% are at least 3.2. 

invNorm (.90,3, %||) = 3.2 



For problems c and d, let EX = the sum of the 75 stress scores. 

N [(75)- (3), V75- 1.15] 

Problem 3 

Find P (Lx < 200). Draw the graph. 

Solution 

The mean of the sum of 75 stress scores is 75 ■ 3 = 225 

The standard deviation of the sum of 75 stress scores is v75 • 1.15 = 9.96 
P (Ex < 200) = 



Then, LX - 
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The probability that the total of 75 scores is less than 200 is about 0. 

normalcdf (j5, 200, 75 • 3, V75 • 1.15) = 0. 

REMINDER: The smallest total of 75 stress scores is 75 since the smallest single score is 1. 



Problem 4 

Find the 90th percentile for the total of 75 stress scores. Draw a graph. 

Solution 

Let k = the 90th percentile. 

Find k where P (Ex <k)= 0.90. 
k = 237.8 



X <k = 0.90. 




225 



The 90th percentile for the sum of 75 scores is about 237.8. This tells us that 90% of all the sums of 
75 scores are no more than 237.8 and 10% are no less than 237.8. 



invNorm .90, 75 • 3, V75 ■ 1.15 = 237. 



Example 7.5 

Suppose that a market research analyst for a cell phone company conducts a study of their cus- 
tomers who exceed the time allowance included on their basic cell phone contract; the analyst 
finds that for those people who exceed the time included in their basic contract, the excess time 
used follows an exponential distribution with a mean of 22 minutes. 

Consider a random sample of 80 customers who exceed the time allowance included in their basic 
cell phone contract. 

Let X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted 
time allowance. 



X ~ Exp I 22 ) From Chapter 5, we know that fi = 22 and a = 22. 



Let X = the mean excess time used by a sample of n — 80 customers who exceed their contracted 
time allowance. 
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X ~ N (22, -p= J by the CLT for Sample Means 

Problem 1 

Using the CLT to find Probability: 

a. Find the probability that the mean excess time used by the 80 customers in the sample is longer 

than 20 minutes. This is asking us to find P (x > 20) Draw the graph. 

b. Suppose that one customer who exceeds the time limit for his cell phone contract is randomly 

selected. Find the probability that this individual customer's excess time is longer than 20 
minutes. This is asking us to find P (x > 20) 

c. Explain why the probabilities in (a) and (b) are different. 

Solution 
Part a. 

Find: P (x > 20) 



P (x > 20) - 0.7919 using normalcdf h.0, 1E99, 22, JL) 



The probability is 0.7919 that the mean excess time used is more than 20 minutes, for a sample of 
80 customers who exceed their contracted time allowance. 




20 22 



REMINDER: 1E99 = 10 w and-lE99 = -10 w . Press the EE key for E. Or just use 10 A 99 instead of 
1E99. 

Part b. 

Find P(x>20) . Remember to use the exponential distribution for an individual: X~Exp(l/22). 

P(X>20) = e A (-(l/22)*20) or e A (-.04545*20) = 0.4029 

Part c. Explain why the probabilities in (a) and (b) are different. 

P (x > 20) = 0.4029 but P (x > 20) = 0.7919 

The probabilities are not equal because we use different distributions to calculate the probability 
for individuals and for means. 

When asked to find the probability of an individual value, use the stated distribution of its ran- 
dom variable; do not use the CLT. Use the CLT with the normal distribution when you are 
being asked to find the probability for an mean. 
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Problem 2 

Using the CLT to find Percentiles: 

Find the 95th percentile for the sample mean excess time for samples of 80 customers who exceed 
their basic contract time allowances. Draw a graph. 

Solution 

Let k = the 95th percentile. Find k where P (x < k) = 0.95 

22 



k = 26.0 using invNorm( .95,22, -j= J = 26.0 

p(x < k)= 0.95 




The 95th percentile for the sample mean excess time used is about 26.0 minutes for random 
samples of 80 customers who exceed their contractual allowed time. 

95% of such samples would have means under 26 minutes; only 5% of such samples would have 
means above 26 minutes. 



NOTE: (HISTORICAL): Normal Approximation to the Binomial 

Historically, being able to compute binomial probabilities was one of the most important applications of the 
Central Limit Theorem. Binomial probabilities were displayed in a table in a book with a small value for n 
(say, 20). To calculate the probabilities with large values of n, you had to use the binomial formula which 
could be very complicated. Using the Normal Approximation to the Binomial simplified the process. To 
compute the Normal Approximation to the Binomial, take a simple random sample from a population. You 
must meet the conditions for a binomial distribution: 

• . there are a certain number n of independent trials 
• . the outcomes of any trial are success or failure 
• . each trial has the same probability of a success p 

Recall that if X is the binomial random variable, then X~B (n, p). The shape of the binomial distribution 
needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must 
both be greater than five (np > 5 and nq > 5; the approximation is better if they are both greater than or 
equal to 10). Then the binomial can be approximated by the normal distribution with mean \i = np and 
standard deviation a — y/npq. Remember that q = 1 — p.In order to get the best approximation, add 0.5 to 
x or subtract 0.5 from x (use X + 0.5 or x — 0.5. The number 0.5 is called the continuity correction factor. 
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Example 7.6 

Suppose in a local Kindergarten through 12th grade (K - 12) school district, 53 percent of the 
population favor a charter school for grades K - 5. A simple random sample of 300 is surveyed. 

1. Find the probability that at least 150 favor a charter school. 

2. Find the probability that at most 160 favor a charter school. 

3. Find the probability that more than 155 favor a charter school. 

4. Find the probability that less than 147 favor a charter school. 

5. Find the probability that exactly 175 favor a charter school. 

Let X = the number that favor a charter school for grades K - 5. X~B (n, p) where n = 300 and 
p = 0.53. Since np > 5 and nq > 5, use the normal approximation to the binomial. The formulas 
for the mean and standard deviation are ji = np and u = ^/npq. The mean is 159 and the standard 
deviation is 8.6447. The random variable for the normal distribution is Y. Y ~ N (159, 8.6447) . See 
The Normal Distribution for help with calculator instructions. 

For Problem 1., you include 150 so P (x > 150) has normal approximation P (Y > 149.5) = 0.8641. 

normalcdf (149.5, 10 A 99, 159,8.6447) =0.8641. 

For Problem 2., you include 160 so P (x < 160) has normal approximation P (Y < 160.5) = 0.5689. 

normalcdf (0,160.5,159,8.6447) = 0.5689 

For Problem 3., you exclude 155 so P (x > 155) has normal approximation P (y > 155.5) = 0.6572. 

normalcdf (155.5, 10 A 99, 159,8.6447) =0.6572 

For Problem 4., you exclude 147 so P (x < 147) has normal approximation P (Y < 146.5) = 0.0741. 

normalcdf (0,146.5,159,8.6447) = 0.0741 

For Problem 5., P (x = 175) has normal approximation P (174.5 < y < 175.5) = 0.0083. 

normalcdf (174.5,175.5,159,8.6447) =0.0083 

Because of calculators and computer software that easily let you calculate binomial probabilities 
for large values of n, it is not necessary to use the the Normal Approximation to the Binomial 
provided you have access to these technology tools. Most school labs have Microsoft Excel, an 
example of computer software that calculates binomial probabilities. Many students have access 
to the TI-83 or 84 series calculators and they easily calculate probabilities for the binomial. In an 
Internet browser, if you type in "binomial probability distribution calculation," you can find at 
least one online calculator for the binomial. 

For Example 3, the probabilities are calculated using the binomial (n = 300 and p = 0.53) below. 
Compare the binomial and normal distribution answers. See Discrete Random Variables for help 
with calculator instructions for the binomial. 

P(x > 150): 1 - binomialcdf (300,0.53,149) = 0.8641 

P(x < 160): binomialcdf (300,0.53,160) = 0.5684 
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P(x > 155): 1 - binomialcdf (300,0.53,155) = 0.6576 

P(x < 147): binomialcdf (300,0.53,146) = 0.0742 

P(x= 175): (You use the binomial pdf.) binomialpdf (175,0.53,146) = 0.0083 
^Contributions made to Example 2 by Roberta Bloom 
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7.5 Summary of Formulas 6 

Formula 7.1: Central Limit Theorem for Sample Means 
X~N(V x ,^l) The Mean (X): Fx 

Formula 7.2: Central Limit Theorem for Sample Means Z-Score and Standard Error of the Mean 
z = % f~/\ Standard Error of the Mean (Standard Deviation (X)): %= 

Formula 7.3: Central Limit Theorem for Sums 
ZX~N[(«)-|ix,V^' (7 x] Mean for Sums (EX): n ■ jix 

Formula 7.4: Central Limit Theorem for Sums Z-Score and Standard Deviation for Sums 
_ x-n-nx Standard Deviation for Sums (EX): Jn ■ ax 



6 This content is available online at <http://cnx.org/content/ml6956/1.8/>. 
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7.6 Practice: The Central Limit Theorem 7 

7.6.1 Student Learning Outcomes 

• The student will calculate probabilities using the Central Limit Theorem. 

7.6.2 Given 

Yoonie is a personnel manager in a large corporation. Each month she must review 16 of the employees. 
From past experience, she has found that the reviews take her approximately 4 hours each to do with a 
population standard deviation of 1.2 hours. Let X be the random variable representing the time it takes 
her to complete one review. Assume X is normally distributed. Let X be the random variable representing 
the mean time to complete the 16 reviews. Let EX be the total time it takes Yoonie to complete all of the 
month's reviews. Assume that the 16 reviews represent a random set of reviews. 

7.6.3 Distribution 

Complete the distributions. 

1. X~ 

2. X~ 

3. EX- 



7.6.4 Graphing Probability 

For each problem below: 

a. Sketch the graph. Label and scale the horizontal axis. Shade the region corresponding to the probability. 

b. Calculate the value. 



Exercise 7.6.1 

Find the probability that one review will take Yoonie from 3.5 to 4.25 hours. 



(Solution on p. 308.) 




a. 

b. P( 



<x< 



Exercise 7.6.2 (Solution on p. 308.) 

Find the probability that the mean of a month's reviews will take Yoonie from 3.5 to 4.25 hrs. 



7 This content is available online at <http://cnx.Org/content/ml6954/l.12/>. 
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X 



a. 

b. P(_ 



Exercise 7.6.3 

Find the 95th percentile for the mean time to complete one month's reviews. 



(Solution on p. 308.) 




b. The 95th Percentile= 



Exercise 7.6.4 (Solution on p. 308.) 

Find the probability that the sum of the month's reviews takes Yoonie from 60 to 65 hours. 




S'l 



b. The Probability= 



Exercise 7.6.5 

Find the 95th percentile for the sum of the month's reviews. 



(Solution on p. 308.) 
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rs 



b. The 95th percentile= 



7.6.5 Discussion Question 

Exercise 7.6.6 

What causes the probabilities in Exercise 7.6.1 and Exercise 7.6.2 to differ? 
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7.7 Homework 8 

Exercise 7.7.1 (Solution on p. 308.) 

X ~ N (60,9). Suppose that you form random samples of 25 from this distribution. Let X be the 
random variable of averages. Let EX be the random variable of sums. For c - f , sketch the graph, 
shade the region, label and scale the horizontal axis for X, and find the probability. 

a. Sketch the distributions of X and X on the same graph. 

b. X~ 

c. P (x < 60) = 

d. Find the 30th percentile for the mean. 

e. P (56 < x < 62) = 

f. P(18<x <58) = 

g. Ex~ 

h. Find the minimum value for the upper quartile for the sum. 
i. P (1400 < Ex < 1550) = 

Exercise 7.7.2 

Determine which of the following are true and which are false. Then, in complete sentences, 
justify your answers. 

a. When the sample size is large, the mean of X is approximately equal to the mean of X. 

b. When the sample size is large, X is approximately normally distributed. 

c. When the sample size is large, the standard deviation of X is approximately the same as the 

standard deviation of X. 

Exercise 7.7.3 (Solution on p. 308.) 

The percent of fat calories that a person in America consumes each day is normally distributed 
with a mean of about 36 and a standard deviation of about 10. Suppose that 16 individuals are 
randomly chosen. 

Let X =average percent of fat calories. 

a- X~ ( ) 

b. For the group of 16, find the probability that the average percent of fat calories consumed is 

more than 5. Graph the situation and shade in the area to be determined. 

c. Find the first quartile for the average percent of fat calories. 

Exercise 7.7.4 

Previously, De Anza statistics students estimated that the amount of change daytime statistics 
students carry is exponentially distributed with a mean of $0.88. Suppose that we randomly pick 
25 daytime statistics students. 

a. In words, X = 

b. X~ _ 

c. In words, X = 

d. X~ ( ) 

e. Find the probability that an individual had between $0.80 and $1.00. Graph the situation and 

shade in the area to be determined. 

f. Find the probability that the average of the 25 students was between $0.80 and $1.00. Graph the 

situation and shade in the area to be determined. 



8 This content is available online at <http://cnx.Org/content/ml6952/l.24/>. 
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g. Explain the why there is a difference in (e) and (f). 

Exercise 7.7.5 (Solution on p. 308.) 

Suppose that the distance of fly balls hit to the outfield (in baseball) is normally distributed with 
a mean of 250 feet and a standard deviation of 50 feet. We randomly sample 49 fly balls. 

a. If X = average distance in feet for 49 fly balls, then X~ ( , ) 

b. What is the probability that the 49 balls traveled an average of less than 240 feet? Sketch the 

graph. Scale the horizontal axis for X. Shade the region corresponding to the probability. 
Find the probability. 

c. Find the 80th percentile of the distribution of the average of 49 fly balls. 

Exercise 7.7.6 

Suppose that the weight of open boxes of cereal in a home with children is uniformly distributed 
from 2 to 6 pounds. We randomly survey 64 homes with children. 

a. In words, X = 

b. X~ 

c. fi x = 

d. <7 X = 

e. In words, EX = 

f. £X~ 

g. Find the probability that the total weight of open boxes is less than 250 pounds. 
h. Find the 35th percentile for the total weight of open boxes of cereal. 

Exercise 7.7.7 (Solution on p. 308.) 

Suppose that the duration of a particular type of criminal trial is known to have a mean of 21 days 
and a standard deviation of 7 days. We randomly sample 9 trials. 

a. In words, EX = 

b. £X~ 

c. Find the probability that the total length of the 9 trials is at least 225 days. 

d. 90 percent of the total of 9 of these types of trials will last at least how long? 

Exercise 7.7.8 

According to the Internal Revenue Service, the average length of time for an individual to com- 
plete (record keep, learn, prepare, copy, assemble and send) IRS Form 1040 is 10.53 hours (without 
any attached schedules). The distribution is unknown. Let us assume that the standard deviation 
is 2 hours. Suppose we randomly sample 36 taxpayers. 

a. In words, X = 

b. In words, X = 

c. X~ 

d. Would you be surprised if the 36 taxpayers finished their Form 1040s in an average of more 

than 12 hours? Explain why or why not in complete sentences. 

e. Would you be surprised if one taxpayer finished his Form 1040 in more than 12 hours? In a 

complete sentence, explain why. 

Exercise 7.7.9 (Solution on p. 308.) 

Suppose that a category of world class runners are known to run a marathon (26 miles) in an 
average of 145 minutes with a standard deviation of 14 minutes. Consider 49 of the races. 

Let X = the average of the 49 races. 
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a. X~ 

b. Find the probability that the runner will average between 142 and 146 minutes in these 49 

marathons. 

c. Find the 80th percentile for the average of these 49 marathons. 

d. Find the median of the average running times. 

Exercise 7.7.10 

The attention span of a two year-old is exponentially distributed with a mean of about 8 minutes. 
Suppose we randomly survey 60 two year-olds. 

a. In words, X = 

b. X~ 

c. In words, X = 

d. X~ 

e. Before doing any calculations, which do you think will be higher? Explain why. 

i. the probability that an individual attention span is less than 10 minutes; or 
ii. the probability that the average attention span for the 60 children is less than 10 minutes? 
Why? 

f. Calculate the probabilities in part (e). 

g. Explain why the distribution for X is not exponential. 

Exercise 7.7.11 (Solution on p. 309.) 

Suppose that the length of research papers is uniformly distributed from 10 to 25 pages. We 
survey a class in which 55 research papers were turned in to a professor. The 55 research papers 
are considered a random collection of all papers. We are interested in the average length of the 
research papers. 

a. In words, X = 

b. X~ 

c. fi x = 

d. a x = _ 

e. In words, X = 

f. X~ 

g. In words, EX = 
h. £X~ 

i. Without doing any calculations, do you think that it's likely that the professor will need to read 

a total of more than 1050 pages? Why? 
j. Calculate the probability that the professor will need to read a total of more than 1050 pages. 
k. Why is it so unlikely that the average length of the papers will be less than 12 pages? 

Exercise 7.7.12 

The length of songs in a collector's CD collection is uniformly distributed from 2 to 3.5 minutes. 
Suppose we randomly pick 5 CDs from the collection. There is a total of 43 songs on the 5 CDs. 

a. In words, X = 

b. X~ _ 

c. In words, X= 

d. X~ 

e. Find the first quartile for the average song length. 

f. The IQR (interquartile range) for the average song length is from to . 
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Exercise 7.7.13 (Solution on p. 309.) 

Salaries for teachers in a particular elementary school district are normally distributed with a 
mean of $44,000 and a standard deviation of $6500. We randomly survey 10 teachers from that 
district. 

a. In words, X = 

b. In words, X = 

c. X~ 

d. In words, EX = 

e. £X~ 

f. Find the probability that the teachers earn a total of over $400,000. 

g. Find the 90th percentile for an individual teacher 's salary. 
h. Find the 90th percentile for the average teachers' salary. 

i. If we surveyed 70 teachers instead of 10, graphically, how would that change the distribution 
forX? 

j. If each of the 70 teachers received a $3000 raise, graphically, how would that change the distri- 
bution for X? 

Exercise 7.7.14 

The distribution of income in some Third World countries is considered wedge shaped (many 
very poor people, very few middle income people, and few to many wealthy people). Suppose we 
pick a country with a wedge distribution. Let the average salary be $2000 per year with a standard 
deviation of $8000. We randomly survey 1000 residents of that country. 

a. In words, X = 

b. In words, X = 

c. X~ 

d. How is it possible for the standard deviation to be greater than the average? 

e. Why is it more likely that the average of the 1000 residents will be from $2000 to $2100 than 

from $2100 to $2200? 

Exercise 7.7.15 (Solution on p. 309.) 

The average length of a maternity stay in a U.S. hospital is said to be 2.4 days with a standard de- 
viation of 0.9 days. We randomly survey 80 women who recently bore children in a U.S. hospital. 

a. In words, X = 

b. In words, X = 

c. X~ 

d. In words, EX = 

e. £X~ 

f . Is it likely that an individual stayed more than 5 days in the hospital? Why or why not? 

g. Is it likely that the average stay for the 80 women was more than 5 days? Why or why not? 
h. Which is more likely: 

i. an individual stayed more than 5 days; or 

ii. the average stay of 80 women was more than 5 days? 

i. If we were to sum up the women's stays, is it likely that, collectively they spent more than a 
year in the hospital? Why or why not? 

Exercise 7.7.16 

In 1940 the average size of a U.S. farm was 174 acres. Let's say that the standard deviation was 55 
acres. Suppose we randomly survey 38 farmers from 1940. (Source: U.S. Dept. of Agriculture) 
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a. In words, X = 

b. In words, X = 

c. X~ _ 

d. The IQR for X is from acres to acres. 

Exercise 7.7.17 (Solution on p. 309.) 

The stock closing prices of 35 U.S. semiconductor manufacturers are given below. (Source: Wall 
Street Journal) 

8.625; 30.25; 27.625; 46.75; 32.875; 18.25; 5; 0.125; 2.9375; 6.875; 28.25; 24.25; 21; 1.5; 30.25; 71; 43.5; 
49.25; 2.5625; 31; 16.5; 9.5; 18.5; 18; 9; 10.5; 16.625; 1.25; 18; 12.875; 7; 12.875; 2.875; 60.25; 29.25 

a. In words, X = 

b. i. x — 

ii. s x = 
iii. n = 

c. Construct a histogram of the distribution of the averages. Start at x = —0.0005. Make bar 

widths of 10. 

d. In words, describe the distribution of stock prices. 

e. Randomly average 5 stock prices together. (Use a random number generator.) Continue aver- 

aging 5 pieces together until you have 10 averages. List those 10 averages. 

f. Use the 10 averages from (e) to calculate: 

i. x — 

ii. s^ = 

g. Construct a histogram of the distribution of the averages. Start at x = —0.0005. Make bar 

widths of 10. 
h. Does this histogram look like the graph in (c)? 

i. In 1 - 2 complete sentences, explain why the graphs either look the same or look different? 
j. Based upon the theory of the Central Limit Theorem, X~ 

Exercise 7.7.18 

Use the Initial Public Offering data (Section 13.3.2: Stock Prices) (see "Table of Contents) to do this 

problem. 

a. In words, X = 

b. i. ji x = 

ii. <7 X = 
iii. n = 

c. Construct a histogram of the distribution. Start at x = —0.50. Make bar widths of $5. 

d. In words, describe the distribution of stock prices. 

e. Randomly average 5 stock prices together. (Use a random number generator.) Continue aver- 

aging 5 pieces together until you have 15 averages. List those 15 averages. 

f. Use the 15 averages from (e) to calculate the following: 



g. Construct a histogram of the distribution of the averages. Start at x = —0.50. Make bar widths 

of $5. 
h. Does this histogram look like the graph in (c)? Explain any differences. 
i. In 1 - 2 complete sentences, explain why the graphs either look the same or look different? 
j. Based upon the theory of the Central Limit Theorem, X~ 
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7.7.1 Try these multiple choice questions (Exercisesl9 - 23). 

The next two questions refer to the following information: The time to wait for a particular rural bus 
is distributed uniformly from to 75 minutes. 100 riders are randomly sampled to learn how long they 
waited. 

Exercise 7.7.19 (Solution on p. 309.) 

The 90th percentile sample average wait time (in minutes) for a sample of 100 riders is: 

A. 315.0 

B. 40.3 

C. 38.5 

D. 65.2 

Exercise 7.7.20 (Solution on p. 309.) 

Would you be surprised, based upon numerical calculations, if the sample average wait time (in 
minutes) for 100 riders was less than 30 minutes? 

A. Yes 

B. No 

C. There is not enough information. 

Exercise 7.7.21 (Solution on p. 309.) 

Which of the following is NOT TRUE about the distribution for averages? 

A. The mean, median and mode are equal 

B. The area under the curve is one 

C. The curve never touches the x-axis 

D. The curve is skewed to the right 

The next three questions refer to the following information: The cost of unleaded gasoline in the Bay Area 
once followed an unknown distribution with a mean of $4.59 and a standard deviation of $0.10. Sixteen gas 
stations from the Bay Area are randomly chosen. We are interested in the average cost of gasoline for the 
16 gas stations. 

Exercise 7.7.22 (Solution on p. 309.) 

The distribution to use for the average cost of gasoline for the 16 gas stations is 

A. X ~ N (4.59, 0.10) 

B. X ~ N (4.59, ^ 

C. X ~ N (4.59, ^ 



D. X-N 4.59 



*' 0.10 J 



Exercise 7.7.23 (Solution on p. 309.) 

What is the probability that the average price for 16 gas stations is over $4.69? 

A. Almost zero 

B. 0.1587 

C. 0.0943 

D. Unknown 

Exercise 7.7.24 (Solution on p. 309.) 

Find the probability that the average price for 30 gas stations is less than $4.55. 
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A. 0.6554 

B. 0.3446 

C. 0.0142 

D. 0.9858 

E. 

Exercise 7.7.25 (Solution on p. 309.) 

For the Charter School Problem (Example 6) in Central Limit Theorem: Using the Central Limit 
Theorem, calculate the following using the normal approximation to the binomial. 

A. Find the probability that less than 100 favor a charter school for grades K - 5. 

B. Find the probability that 170 or more favor a charter school for grades K - 5. 

C. Find the probability that no more than 140 favor a charter school for grades K - 5. 

D. Find the probability that there are fewer than 130 that favor a charter school for grades K - 5. 

E. Find the probability that exactly 150 favor a charter school for grades K - 5. 

If you either have access to an appropriate calculator or computer software, try calculating these 
probabilities using the technology. Try also using the suggestion that is at the bottom of Central 
Limit Theorem: Using the Central Limit Theorem for finding a website that calculates binomial 
probabilities. 

Exercise 7.7.26 (Solution on p. 309.) 

Four friends, Janice, Barbara, Kathy and Roberta, decided to carpool together to get to school. 
Each day the driver would be chosen by randomly selecting one of the four names. They carpool 
to school for 96 days. Use the normal approximation to the binomial to calculate the following 
probabilities. Round the standard deviation to 4 decimal places. 

A. Find the probability that Janice is the driver at most 20 days. 

B. Find the probability that Roberta is the driver more than 16 days. 

C. Find the probability that Barbara drives exactly 24 of those 96 days. 

If you either have access to an appropriate calculator or computer software, try calculating these 
probabilities using the technology. Try also using the suggestion that is at the bottom of Central 
Limit Theorem: Using the Central Limit Theorem for finding a website that calculates binomial 
probabilities. 
**Exercise 24 contributed by Roberta Bloom 
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7.8 Review 9 

The next three questions refer to the following information: Richard's Furniture Company delivers fur- 
niture from 10 A.M. to 2 P.M. continuously and uniformly. We are interested in how long (in hours) past 
the 10 A.M. start time that individuals wait for their delivery. 

Exercise 7.8.1 (Solution on p. 310.) 

X ~ 

A. (J (0,4) 

B. (J (10, 2) 

C. Exp (2) 

D. N(2,l) 



Exercise 7.8.2 

The average wait time is: 



(Solution on p. 310.) 



A. 1 hour 

B. 2 hour 

C. 2.5 hour 

D. 4 hour 

Exercise 7.8.3 (Solution on p. 310.) 

Suppose that it is now past noon on a delivery day. The probability that a person must wait at 
least 1 1 more hours is: 



A. 
B. 
C. 
D. 



l 

4 
1 
2 

3 
4 
3 



Exercise 7.8.4 
Given: X~Exp 



(Solution on p. 310.) 



a. FindP(x > 1) 

b. Calculate the minimum value for the upper quartile. 

c. Find P I .t 



Exercise 7.8.5 

• 40% of full-time students took 4 years to graduate 

• 30% of full-time students took 5 years to graduate 

• 20% of full-time students took 6 years to graduate 

• 10% of full-time students took 7 years to graduate 

The expected time for full-time students to graduate is: 



(Solution on p. 310.) 



A. 4 years 

B. 4.5 years 

C. 5 years 

D. 5.5 years 



9 This content is available online at <http://cnx.Org/content/ml6955/l.12/>. 
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Exercise 7.8.6 (Solution on p. 310.) 

Which of the following distributions is described by the following example? 

Many people can run a short distance of under 2 miles, but as the distance increases, fewer people 
can run that far. 

A. Binomial 

B. Uniform 

C. Exponential 

D. Normal 

Exercise 7.8.7 (Solution on p. 310.) 

The length of time to brush one's teeth is generally thought to be exponentially distributed with 
a mean of § minutes. Find the probability that a randomly selected person brushes his/her teeth 
less than | minutes. 

A. 0.5 

B- ! 

C. 0.43 

D. 0.63 

Exercise 7.8.8 (Solution on p. 310.) 

Which distribution accurately describes the following situation? 

The chance that a teenage boy regularly gives his mother a kiss goodnight (and he should!!) is 
about 20%. Fourteen teenage boys are randomly surveyed. 

X =the number of teenage boys that regularly give their mother a kiss goodnight 

A. 5(14,0.20) 

B. P(2.8) 

C. N (2.8, 2.24) 

D. Exp (fjig) 

Exercise 7.8.9 (Solution on p. 310.) 

Which distribution accurately describes the following situation? 

A 2008 report on technology use states that approximately 20 percent of U.S. households have 
never sent an e-mail, (source: http://www.webguild.org/2008/05/20-percent-of-americans- 
have-never-used-email.php) Suppose that we select a random sample of fourteen U.S. households 

X =the number of households in a 2008 sample of 14 households that have never sent an email 

A. B (14,0.20) 

B. P(2.8) 

C. N (2.8, 2.24) 

D. Exp (^ 
**Exercise 9 contributed by Roberta Bloom 
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7.9 Lab 1: Central Limit Theorem (Pocket Change) 10 

Class Time: 
Names: 

7.9.1 Student Learning Outcomes: 

• The student will demonstrate and compare properties of the Central Limit Theorem. 

NOTE: This lab works best when sampling from several classes and combining data. 



7.9.2 Collect the Data 

1. Count the change in your pocket. (Do not include bills.) 

2. Randomly survey 30 classmates. Record the values of the change. 



Table 7.1 

3. Construct a histogram. Make 5-6 intervals. Sketch the graph using a ruler and pencil. Scale the axes. 



"This content is available online at <http://cnx.Org/content/ml6950/l.10/>. 
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Frequency 



Value of the Change 



Figure 7.1 



4. Calculate the following (n = 1; surveying one person at a time): 

a. x- 

b. s = 

5. Draw a smooth curve through the tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. 

7.9.3 Collecting Averages of Pairs 

Repeat steps 1-5 (of the section above titled "Collect the Data") with one exception. Instead of recording 
the change of 30 classmates, record the average change of 30 pairs. 

1 . Randomly survey 30 pairs of classmates. Record the values of the average of their change. 



Table 7.2 

2. Construct a histogram. Scale the axes using the same scaling you did for the section titled "Collecting 
the Data". Sketch the graph using a ruler and a pencil. 
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Frequency 



Value of the Change 



Figure 7.2 



3. Calculate the following (n = 2; surveying two people at a time): 

a. x- 

b. s = 

4. Draw a smooth curve through tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. 



7.9.4 Collecting Averages of Groups of Five 

Repeat steps 1-5 (of the section titled "Collect the Data") with one exception. Instead of recording the 
change of 30 classmates, record the average change of 30 groups of 5. 

1. Randomly survey 30 groups of 5 classmates. Record the values of the average of their 
change. 



Table 7.3 
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2. Construct a histogram. Scale the axes using the same scaling you did for the section titled "Collect the 
Data". Sketch the graph using a ruler and a pencil. 



Frequency 



Value of the Change 



Figure 7.3 



3. Calculate the following (n = 5; surveying five people at a time): 

a. x- 

b. s = 

4. Draw a smooth curve through tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. 



7.9.5 Discussion Questions 

1. As n changed, why did the shape of the distribution of the data change? Use 1-2 complete sentences 
to explain what happened. 

2. In the section titled "Collect the Data", what was the approximate distribution of the data? X ~ 

3. In the section titled "Collecting Averages of Groups of Five", what was the approximate distribution 
of the averages? X ~ 

4. In 1 - 2 complete sentences, explain any differences in your answers to the previous two questions. 
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7.10 Lab 2: Central Limit Theorem (Cookie Recipes) 

Class Time: 
Names: 



n 



7.10.1 Student Learning Outcomes: 

• The student will demonstrate and compare properties of the Central Limit Theorem. 

7.10.2 Given: 

X = length of time (in days) that a cookie recipe lasted at the Olmstead Homestead. (Assume that each of 
the different recipes makes the same quantity of cookies.) 



Recipe # 


X 




Recipe # 


X 




Recipe # 


X 




Recipe # 


X 


1 


1 




16 


2 




31 


3 




46 


2 


2 


5 




17 


2 




32 


4 




47 


2 


3 


2 




18 


4 




33 


5 




48 


11 


4 


5 




19 


6 




34 


6 




49 


5 


5 


6 




20 


1 




35 


6 




50 


5 


6 


1 




21 


6 




36 


1 




51 


4 


7 


2 




22 


5 




37 


1 




52 


6 


8 


6 




23 


2 




38 


2 




53 


5 


9 


5 




24 


5 




39 


1 




54 


1 


10 


2 




25 


1 




40 


6 




55 


1 


11 


5 




26 


6 




41 


1 




56 


2 


12 


1 




27 


4 




42 


6 




57 


4 


13 


1 




28 


1 




43 


2 




58 


3 


14 


3 




29 


6 




44 


6 




59 


6 


15 


2 




30 


2 




45 


2 




60 


5 



Calculate the following: 

a. ^ = 

b. cr x = 



Table 7.4 



^his content is available online at <http://cnx.Org/content/ml6945/l.ll/>. 
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7.10.3 Collect the Data 

Use a random number generator to randomly select 4 samples of size n = 5 from the given population. 
Record your samples below. Then, for each sample, calculate the mean to the nearest tenth. Record them in 
the spaces provided. Record the sample means for the rest of the class. 

1. Complete the table: 





Sample 1 


Sample 2 


Sample 3 


Sample 4 


Sample means from other groups: 






























































Means: 


x — 


x — 


x = 


x = 





Table 7.5 



2. Calculate the following: 



a. 

b. 



x = 



3. Again, use a random number generator to randomly select 4 samples from the population. This time, 
make the samples of size n = 10. Record the samples below. As before, for each sample, calculate the 
mean to the nearest tenth. Record them in the spaces provided. Record the sample means for the rest 
of the class. 





Sample 1 


Sample 2 


Sample 3 


Sample 4 


Sample means from other groups: 


























































































































Means: 


x = 


x = 


x = 


x = 





Table 7.6 



4. Calculate the following: 



a. x = 

b. s^ 



305 

5. For the original population, construct a histogram. Make intervals with bar width = 1 day. Sketch the 
graph using a ruler and pencil. Scale the axes. 



Frequency 



Time (days) 



Figure 7.4 



6. Draw a smooth curve through the tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. 



7.10.4 Repeat the Procedure for n=5 

1 . For the sample of n = 5 days averaged together, construct a histogram of the averages (your means 
together with the means of the other groups). Make intervals with bar widths =jday. Sketch the 
graph using a ruler and pencil. Scale the axes. 
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Frequency 



Time (days) 



Figure 7.5 



2. Draw a smooth curve through the tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. 



7.10.5 Repeat the Procedure for n=10 

1. For the sample of n = 10 days averaged together, construct a histogram of the averages (your means 
together with the means of the other groups). Make intervals with bar widths =jday. Sketch the 
graph using a ruler and pencil. Scale the axes. 
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Frequency 



Time (days) 



Figure 7.6 



2. Draw a smooth curve through the tops of the bars of the histogram. Use 1-2 complete sentences to 
describe the general shape of the curve. 



7.10.6 Discussion Questions 

1. Compare the three histograms you have made, the one for the population and the two for the sample 
means. In three to five sentences, describe the similarities and differences. 

2. State the theoretical (according to the CLT) distributions for the sample means. 

a. n = 5: X ~ 

b. n = 10: X~ 

3. Are the sample means for n = 5 and n = 10 "close" to the theoretical mean, fi x ? Explain why or why 
not. 

4. Which of the two distributions of sample means has the smaller standard deviation? Why? 

5. As n changed, why did the shape of the distribution of the data change? Use 1-2 complete sentences 
to explain what happened. 



NOTE: This lab was designed and contributed by Carol Olmstead. 
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Solutions to Exercises in Chapter 7 

Solutions to Practice: The Central Limit Theorem 

Solution to Exercise 7.6.1 (p. 287) 

b. 3.5,4.25,0.2441 

Solution to Exercise 7.6.2 (p. 287) 

b. 0.7499 

Solution to Exercise 7.6.3 (p. 288) 

b. 4.49 hours 

Solution to Exercise 7.6.4 (p. 288) 

b. 0.3802 

Solution to Exercise 7.6.5 (p. 288) 

b: 71.90 

Solutions to Homework 

Solution to Exercise 7.7.1 (p. 290) 

b. Xbar~N (60, JL 

c. 0.5000 

d. 59.06 

e. 0.8536 

f. 0.1333 
h. 1530.35 
i. 0.8536 

Solution to Exercise 7.7.3 (p. 290) 

,N(36,^ 

b. 1 

c. 34.31 

Solution to Exercise 7.7.5 (p. 291) 

a - N ( 250 '7l 

b. 0.0808 

c. 256.01 feet 

Solution to Exercise 7.7.7 (p. 291) 

a. The total length of time for 9 criminal trials 

b. N (189,21) 

c. 0.0432 

d. 162.09 

Solution to Exercise 7.7.9 (p. 291) 



309 



a - N ( 145 '7I5 

b. 0.6247 

c. 146.68 

d. 145 minutes 

Solution to Exercise 7.7.11 (p. 292) 

b. (J (10,25) 

c. 17.5 

d. y^ = 4.3301 

f. N (17.5, 0.5839) 
h. N (962.5,32.11) 
j. 0.0032 

Solution to Exercise 7.7.13 (p. 293) 
c. N (44,000, ^ffi) 

e. N (440,000, (VW) (6500)) 

f. 0.9742 

g. $52,330 
h. $46,634 

Solution to Exercise 7.7.15 (p. 293) 

0.9 



c - N I 2 ' 4 ' Vso 

e. N (192,8.05) 
h. Individual 

Solution to Exercise 7.7.17 (p. 294) 

b. $20.71; $17.31; 35 

d. Exponential distribution, X ~ Exp (1/20.71) 

f. $20.71; $11.14 

).N (20.71,^) 

Solution to Exercise 7.7.19 (p. 295) 

B 
Solution to Exercise 7.7.20 (p. 295) 

A 
Solution to Exercise 7.7.21 (p. 295) 

D 
Solution to Exercise 7.7.22 (p. 295) 

B 
Solution to Exercise 7.7.23 (p. 295) 

A 
Solution to Exercise 7.7.24 (p. 295) 

C 
Solution to Exercise 7.7.25 (p. 296) 

C. 0.0162 
E. 0.0268 

Solution to Exercise 7.7.26 (p. 296) 

A. 0.2047 

B. 0.9615 

C. 0.0938 
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Solutions to Review 

Solution to Exercise 7.8.1 (p. 297) 

A 

Solution to Exercise 7.8.2 (p. 297) 

B 

Solution to Exercise 7.8.3 (p. 297) 

A 

Solution to Exercise 7.8.4 (p. 297) 

a. 0.7165 

b. 4.16 

c. 

Solution to Exercise 7.8.5 (p. 297) 

C 

Solution to Exercise 7.8.6 (p. 298) 

C 

Solution to Exercise 7.8.7 (p. 298) 

D 

Solution to Exercise 7.8.8 (p. 298) 

A 

Solution to Exercise 7.8.9 (p. 298) 

A 



Chapter 8 

Confidence Intervals 

8.1 Confidence Intervals 1 

8.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Calculate and interpret confidence intervals for one population mean and one population proportion. 

• Interpret the student-t probability distribution as the sample size changes. 

• Discriminate between problems applying the normal and the student-t distributions. 

8.1.2 Introduction 

Suppose you are trying to determine the mean rent of a two-bedroom apartment in your town. You might 
look in the classified section of the newspaper, write down several rents listed, and average them together. 
You would have obtained a point estimate of the true mean. If you are trying to determine the percent of 
times you make a basket when shooting a basketball, you might count the number of shots you make and 
divide that by the number of shots you attempted. In this case, you would have obtained a point estimate 
for the true proportion. 

We use sample data to make generalizations about an unknown population. This part of statistics is called 
inferential statistics. The sample data help us to make an estimate of a population parameter. We realize 
that the point estimate is most likely not the exact value of the population parameter, but close to it. After 
calculating point estimates, we construct confidence intervals in which we believe the parameter lies. 

In this chapter, you will learn to construct and interpret confidence intervals. You will also learn a new 
distribution, the Student' s-t, and how it is used with these intervals. Throughout the chapter, it is important 
to keep in mind that the confidence interval is a random variable. It is the parameter that is fixed. 

If you worked in the marketing department of an entertainment company, you might be interested in the 
mean number of compact discs (CD's) a consumer buys per month. If so, you could conduct a survey 
and calculate the sample mean, x, and the sample standard deviation, s. You would use x to estimate 
the population mean and s to estimate the population standard deviation. The sample mean, x, is the 
point estimate for the population mean, \i. The sample standard deviation, s, is the point estimate for the 
population standard deviation, c. 



lr rhis content is available online at <http://cnx.Org/content/ml6967/l.16/>. 
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Each of x and s is also called a statistic. 

A confidence interval is another type of estimate but, instead of being just one number, it is an interval 
of numbers. The interval of numbers is a range of values calculated from a given set of sample data. The 
confidence interval is likely to include an unknown population parameter. 

Suppose for the CD example we do not know the population mean ji but we do know that the population 
standard deviation is a = 1 and our sample size is 100. Then by the Central Limit Theorem, the standard 
deviation for the sample mean is 

— = 1 =01 
•Jn Vioo 

The Empirical Rule, which applies to bell-shaped distributions, says that in approximately 95% of the 
samples, the sample mean, x, will be within two standard deviations of the population mean \i. For our CD 
example, two standard deviations is (2) (0.1) = 0.2. The sample mean x is likely to be within 0.2 units of 
¥• 

Because x is within 0.2 units of }i, which is unknown, then ji is likely to be within 0.2 units of x in 95% 
of the samples. The population mean ji is contained in an interval whose lower number is calculated by 
taking the sample mean and subtracting two standard deviations ((2) (0.1)) and whose upper number is 
calculated by taking the sample mean and adding two standard deviations. In other words, y. is between 
x — 0.2 and x + 0.2 in 95% of all the samples. 

For the CD example, suppose that a sample produced a sample mean x — 2. Then the unknown population 
mean \i is between 

x - 0.2 = 2 - 0.2 = 1.8 and x + 0.2 = 2 + 0.2 = 2.2 

We say that we are 95% confident that the unknown population mean number of CDs is between 1.8 and 
2.2. The 95% confidence interval is (1.8, 2.2). 

The 95% confidence interval implies two possibilities. Either the interval (1.8, 2.2) contains the true mean \i 
or our sample produced an x that is not within 0.2 units of the true mean ji. The second possibility happens 
for only 5% of all the samples (100% - 95%). 

Remember that a confidence interval is created for an unknown population parameter like the population 
mean, \i. Confidence intervals for some parameters have the form 

(point estimate - margin of error, point estimate + margin of error) 

The margin of error depends on the confidence level or percentage of confidence. 

When you read newspapers and journals, some reports will use the phrase "margin of error." Other reports 
will not use that phrase, but include a confidence interval as the point estimate + or - the margin of error. 
These are two ways of expressing the same concept. 

NOTE: Although the text only covers symmetric confidence intervals, there are non-symmetric 
confidence intervals (for example, a confidence interval for the standard deviation). 



8.1.3 Optional Collaborative Classroom Activity 

Have your instructor record the number of meals each student in your class eats out in a week. Assume 
that the standard deviation is known to be 3 meals. Construct an approximate 95% confidence interval for 
the true mean number of meals students eat out each week. 



1. Calculate the sample mean. 

2. (J = 3 and n = the number of students surveyed. 
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3. Construct the interval lx-2- -%=,x + 2- -%- 

We say we are approximately 95% confident that the true average number of meals that students eat out in 
a week is between and . 

8.2 Confidence Interval, Single Population Mean, Population Standard 
Deviation Known, Normal 2 

8.2.1 Calculating the Confidence Interval 

To construct a confidence interval for a single unknown population mean \i , where the population stan- 
dard deviation is known, we need x as an estimate for \i and we need the margin of error. Here, the 
margin of error is called the error bound for a population mean (abbreviated EBM). The sample mean x is 
the point estimate of the unknown population mean ji 

The confidence interval estimate will have the form: 

(point estimate - error bound, point estimate + error bound) or, in symbols,(x — EBM,x + EBM) 

The margin of error depends on the confidence level (abbreviated CL). The confidence level is often con- 
sidered the probability that the calculated confidence interval estimate will contain the true population 
parameter. However, it is more accurate to state that the confidence level is the percent of confidence in- 
tervals that contain the true population parameter when repeated samples are taken. Most often, it is the 
choice of the person constructing the confidence interval to choose a confidence level of 90% or higher 
because that person wants to be reasonably certain of his or her conclusions. 

There is another probability called alpha (a), a is related to the confidence level CL. a. is the probability that 
the interval does not contain the unknown population parameter. 
Mathematically, at + CL = 1. 

Example 8.1 

Suppose we have collected data from a sample. We know the sample mean but we do not know 

the mean for the entire population. 
The sample mean is 7 and the error bound for the mean is 2.5. 

x = 7 and EBM = 2.5. 

The confidence interval is (7 — 2.5, 7 + 2.5); calculating the values gives (4.5, 9.5). 

If the confidence level (CL) is 95%, then we say that "We estimate with 95% confidence that the 
true value of the population mean is between 4.5 and 9.5." 

A confidence interval for a population mean with a known standard deviation is based on the fact that the 
sample means follow an approximately normal distribution. Suppose that our sample has a mean of x — 10 
and we have constructed the 90% confidence interval (5, 15) where EBM = 5. 

To get a 90% confidence interval, we must include the central 90% of the probability of the normal distri- 
bution. If we include the central 90%, we leave out a total of a. = 10% in both tails, or 5% in each tail, of the 
normal distribution. 



2 This content is available online at <http://cnx.Org/content/ml6962/l.23/>. 
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Confidence Level (CL) = 0.90 



x= 10 
EBM = 5 
~ x - EBM = 5 

x + EBM = 15 

jx is believed to be in the interval (5, 15) with 90% confidence. 

To capture the central 90%, we must go out 1.645 "standard deviations" on either side of the calculated 
sample mean. 1.645 is the z-score from a Standard Normal probability distribution that puts an area of 0.90 
in the center, an area of 0.05 in the far left tail, and an area of 0.05 in the far right tail. 

It is important that the "standard deviation" used must be appropriate for the parameter we are estimating. 
So in this section, we need to use the standard deviation that applies to sample means, which is -j= . -7= is 
commonly called the "standard error of the mean" in order to clearly distinguish the standard deviation for 
a mean from the population standard deviation a. 

In summary, as a result of the Central Limit Theorem: 

• X is normally distributed, that is, X ~ N ( }ixr ~j= J • 

• When the population standard deviation <r is known, we use a Normal distribution to calculate 
the error bound. 

Calculating the Confidence Interval: 

To construct a confidence interval estimate for an unknown population mean, we need data from a random 
sample. The steps to construct and interpret the confidence interval are: 

• Calculate the sample mean x from the sample data. Remember, in this section, we already know the 
population standard deviation a. 

• Find the Z-score that corresponds to the confidence level. 

• Calculate the error bound EBM 

• Construct the confidence interval 
Write a sentence that interprets the estimate in the context of the situation in the problem. (Explain 
what the confidence interval means, in the words of the problem.) 



• 



We will first examine each step in more detail, and then illustrate the process with some examples. 

Finding z for the stated Confidence Level 

When we know the population standard deviation c, we use a standard normal distribution to calculate 
the error bound EBM and construct the confidence interval. We need to find the value of z that puts an area 
equal to the confidence level (in decimal form) in the middle of the standard normal distribution Z<~N(0,1). 

The confidence level, CL, is the area in the middle of the standard normal distribution. CL — 1 — a. So a is 
the area that is split equally between the two tails. Each of the tails contains an area equal to j . 

The z-score that has an area to the right of | is denoted by 2 a 
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For example, when CL = 0.95 then oc = 0.05 and | = 0.025 ; we write Ze — z.025 

The area to the right of z.025 is 0.025 and the area to the left of Z.025 is 1-0.025 = 0.975 

Za = Z0.025 = 1-96 , using a calculator, computer or a Standard Normal probability table. 

Using the TI83, TI83+ or TI84+ calculator: invNorm(0.975, 0, 1) = 1.96 

CALCULATOR NOTE: Remember to use area to the LEFT of z< ; in this chapter the last two inputs in the 
invNorm command are 0,1 because you are using a Standard Normal Distribution Z~N(0,1) 

EBM: Error Bound 

The error bound formula for an unknown population mean }i when the population standard deviation a is 
known is 

• EBM = z« ■ 4= 

Constructing the Confidence Interval 

• The confidence interval estimate has the format (x — EBM,x + EBM). 
The graph gives a picture of the entire situation. 

CL+f + f = CL + « = 1. 

a a 

T CL=1-a T 




x - EBM * x + EBM 

Writing the Interpretation 

The interpretation should clearly state the confidence level (CL), explain what population parameter is 
being estimated (here, a population mean), and should state the confidence interval (both endpoints). "We 

estimate with % confidence that the true population mean (include context of the problem) is between 

and (include appropriate units)." 

Example 8.2 

Suppose scores on exams in statistics are normally distributed with an unknown population mean 
and a population standard deviation of 3 points. A random sample of 36 scores is taken and gives 
a sample mean (sample mean score) of 68. Find a confidence interval estimate for the population 
mean exam score (the mean score on all exams). 

Problem 

Find a 90% confidence interval for the true (population) mean of statistics exam scores. 
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Solution 

• You can use technology to directly calculate the confidence interval 

• The first solution is shown step-by-step (Solution A). 

• The second solution uses the TI-83, 83+ and 84+ calculators (Solution B). 

Solution A 

To find the confidence interval, you need the sample mean, x, and the EBM. 

x = 68 

EBM = z, ■ (-?= 

cr = 3 ; n = 36 ; The confidence level is 90% (CL=0.90) 

CL = 0.90 so a = 1 - CL = 1 - 0.90 = 0.10 

| = 0.05 Z* — Z.05 

The area to the right of Z.05 is 0-05 and the area to the left of z.05 is 1—0.05=0.95 

z« = z.05 — 1-645 

using invNorm(0. 95,0,1) on the TI-83,83+,84+ calculators. This can also be found using appropriate 
commands on other calculators, using a computer, or using a probability table for the Standard 
Normal distribution. 

EBM = 1.645 ■ (-?=) = 0.8225 

x - EBM = 68 - 0.8225 = 67.1775 
x + EBM = 68 + 0.8225 = 68.8225 
The 90% confidence interval is (67.1775, 68.8225). 

Solution B 

Using a function of the TI-83, TI-83+ or TI-84 calculators: 

Press STAT and arrow over to TESTS. 

Arrow down to 7 : Z Interval. 

Press ENTER. 

Arrow to Stats and press ENTER. 

Arrow down and enter 3 for a, 68 for x , 36 for n, and .90 for C-level. 

Arrow down to Calculate and press ENTER. 

The confidence interval is (to 3 decimal places) (67.178, 68.822). 

Interpretation 

We estimate with 90% confidence that the true population mean exam score for all statistics stu- 
dents is between 67.18 and 68.82. 

Explanation of 90% Confidence Level 

90% of all confidence intervals constructed in this way contain the true mean statistics exam score. 
For example, if we constructed 100 of these confidence intervals, we would expect 90 of them to 
contain the true population mean exam score. 
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8.2.2 Changing the Confidence Level or Sample Size 

Example 8.3: Changing the Confidence Level 

Suppose we change the original problem by using a 95% confidence level. Find a 95% confidence 
interval for the true (population) mean statistics exam score. 

Solution 

To find the confidence interval, you need the sample mean, x, and the EBM. 

x = 68 

EBM = z« ■ (-?= 

a = 3 ; n = 36 ; The confidence level is 95% (CL=0.95) 
CL = 0.95 so a = 1 - CL = 1 - 0.95 = 0.05 
2 = 0.025 z« = z.025 
The area to the right of Z.025 is 0.025 and the area to the left of z.025 is 1—0.025=0.975 

2 1 = z .025 = 1-96 

using invnorm(. 975,0,1) on the TI-83,83+,84+ calculators. (This can also be found using appropri- 
ate commands on other calculators, using a computer, or using a probability table for the Standard 
Normal distribution.) 

EBM = 1 - 96 -(^)= - 98 
x - EBM = 68 - 0.98 = 67.02 

x + EBM = 68 + 0.98 = 68.98 

Interpretation 

We estimate with 95 % confidence that the true population mean for all statistics exam scores is 
between 67.02 and 68.98. 

Explanation of 95% Confidence Level 

95% of all confidence intervals constructed in this way contain the true value of the population 
mean statistics exam score. 

Comparing the results 

The 90% confidence interval is (67.18, 68.82). The 95% confidence interval is (67.02, 68.98). The 
95% confidence interval is wider. If you look at the graphs, because the area 0.95 is larger than the 
area 0.90, it makes sense that the 95% confidence interval is wider. 
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0.05 



0.90 



0.05 



0.025 




0.95 



0.025 




(a) 



(b) 



Figure 8.1 



Summary: Effect of Changing the Confidence Level 

• Increasing the confidence level increases the error bound, making the confidence interval 
wider. 

• Decreasing the confidence level decreases the error bound, making the confidence interval 
narrower. 



Example 8.4: Changing the Sample Size: 

Suppose we change the original problem to see what happens to the error bound if the sample size 
is changed. 

Problem 

Leave everything the same except the sample size. Use the original 90% confidence level. What 
happens to the error bound and the confidence interval if we increase the sample size and use 
n=100 instead of n=36? What happens if we decrease the sample size to n=25 instead of n=36? 



• x = 68 

• EBM = z« 



a 
2 V V" 



cr = 3 ; The confidence level is 90% (CL=0.90) ; z« = z, 05 = 1.645 



Solution A 

If we increase the sample size n to 100, we decrease the error bound. 



When n = 100 : EBM = z* ■ M= = 1.645 , 

z WW v Vioo 



= 0.4935 



Solution B 

If we decrease the sample size n to 25, we increase the error bound. 



When n = 25 : EBM = z* ■ 4= 

2 Vv" 



1M5 ivk)-°- 9S7 



Summary: Effect of Changing the Sample Size 



• Increasing the sample size causes the error bound to decrease, making the confidence inter- 
val narrower. 

• Decreasing the sample size causes the error bound to increase, making the confidence inter- 
val wider. 
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8.2.3 Working Backwards to Find the Error Bound or Sample Mean 

Working Bacwards to find the Error Bound or the Sample Mean 

When we calculate a confidence interval, we find the sample mean and calculate the error bound and use 
them to calculate the confidence interval. But sometimes when we read statistical studies, the study may 
state the confidence interval only. If we know the confidence interval, we can work backwards to find both 
the error bound and the sample mean. 

Finding the Error Bound 

• From the upper value for the interval, subtract the sample mean 

• OR, From the upper value for the interval, subtract the lower value. Then divide the difference by 2. 

Finding the Sample Mean 

• Subtract the error bound from the upper value of the confidence interval 

• OR, Average the upper and lower endpoints of the confidence interval 

Notice that there are two methods to perform each calculation. You can choose the method that is easier to 
use with the information you know. 

Example 8.5 

Suppose we know that a confidence interval is (67.18, 68.82) and we want to find the error bound. 
We may know that the sample mean is 68. Or perhaps our source only gave the confidence interval 
and did not tell us the value of the the sample mean. 

Calculate the Error Bound: 

• If we know that the sample mean is 68: EBM = 68.82 - 68 = 0.82 

• If we don't know the sample mean: EBM = - — : — s — : — = 0-82 

Calculate the Sample Mean: 

• If we know the error bound: x = 68.82 — 0.82 = 68 

• If we don't know the error bound: x = - — : — ~ — : — - = 68 



8.2.4 Calculating the Sample Size n 

If researchers desire a specific margin of error, then they can use the error bound formula to calculate the 
required sample size. 

The error bound formula for a population mean when the population standard deviation is known is 

EBM = 2 rte) 

2 2 

The formula for sample size is n — 2 a 2 > found by solving the error bound formula for n 

In this formula, z is z « , corresponding to the desired confidence level. A researcher planning a study who 
wants a specified confidence level and error bound can use this formula to calculate the size of the sample 
needed for the study. 

Example 8.6 

The population standard deviation for the age of Foothill College students is 15 years. If we 
want to be 95% confident that the sample mean age is within 2 years of the true population mean 
age of Foothill College students , how many randomly selected Foothill College students must be 
surveyed? 
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From the problem, we know that a = 15 and EBM=2 
2 — z .025 = 1-96, because the confidence level is 95%. 



_ 1.96 / 15 / _ 



216.09 using the sample size equation. 



EBM Z 2 Z 

Use n = 217: Always round the answer UP to the next higher integer to ensure that the sample 
size is large enough. 

Therefore, 217 Foothill College students should be surveyed in order to be 95% confident that we 
are within 2 years of the true population mean age of Foothill College students. 

**With contributions from Roberta Bloom 



8.3 Confidence Interval, Single Population Mean, Standard Deviation 
Unknown, Student-T 3 

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, 
this did not present a problem to statisticians. They used the sample standard deviation s as an estimate 
for a and proceeded as before to calculate a confidence interval with close enough results. However, 
statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in 
the confidence interval. 

William S. Gossett (1876-1937) of the Guinness brewery in Dublin, Ireland ran into this problem. His exper- 
iments with hops and barley produced very few samples. Just replacing <r with s did not produce accurate 
results when he tried to calculate a confidence interval. He realized that he could not use a normal distri- 
bution for the calculation; he found that the actual distribution depends on the sample size. This problem 
led him to "discover" what is called the Student's-t distribution. The name comes from the fact that Gosset 
wrote under the pen name "Student." 

Up until the mid 1970s, some statisticians used the normal distribution approximation for large sample 
sizes and only used the Student's-t distribution for sample sizes of at most 30. With the common use of 
graphing calculators and computers, the practice is to use the Student's-t distribution whenever s is used 
as an estimate for a. 

If you draw a simple random sample of size n from a population that has approximately a normal distri- 
bution with mean ji and unknown population standard deviation a and calculate the t-score t = f K , 

vk) 
then the t-scores follow a Student's-t distribution with n — 1 degrees of freedom. The t-score has the same 

interpretation as the z-score. It measures how far x is from its mean \i. For each sample size n, there is a 

different Student's-t distribution. 

The degrees of freedom, n — 1, come from the calculation of the sample standard deviation s. In Chapter 
2, we used n deviations (x — x values) to calculate s. Because the sum of the deviations is 0, we can find 
the last deviation once we know the other n — 1 deviations. The other n — 1 deviations can change or vary 
freely. We call the number n — 1 the degrees of freedom (df). 

Properties of the Student's-t Distribution 

• The graph for the Student's-t distribution is similar to the Standard Normal curve. 

• The mean for the Student's-t distribution is and the distribution is symmetric about 0. 



3 This content is available online at <http://cnx.Org/content/ml6959/l.24/>. 
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• The Student's-t distribution has more probability in its tails than the Standard Normal distribution 
because the spread of the t distribution is greater than the spread of the Standard Normal. So the 
graph of the Student's-t distribution will be thicker in the tails and shorter in the center than the 
graph of the Standard Normal distribution. 

• The exact shape of the Student's-t distribution depends on the "degrees of freedom". As the degrees 
of freedom increases, the graph Student's-t distribution becomes more like the graph of the Standard 
Normal distribution. 

• The underlying population of individual observations is assumed to be normally distributed with 
unknown population mean \i and unknown population standard deviation a. The size of the under- 
lying population is generally not relevant unless it is very small. If it is bell shaped (normal) then the 
assumption is met and doesn't need discussion. Random sampling is assumed but it is a completely 
separate assumption from normality. 

Calculators and computers can easily calculate any Student's-t probabilities. The TI-83,83+,84+ have a tcdf 
function to find the probability for given values of t. The grammar for the tcdf command is tcdf(lower 
bound, upper bound, degrees of freedom). However for confidence intervals, we need to use inverse 
probability to find the value of t when we know the probability. 

For the TI-84+ you can use the invT command on the DISTRibution menu. The invT command works 
similarly to the invnorm. The invT command requires two inputs: invT(area to the left, degrees of 
freedom) The output is the t-score that corresponds to the area we specified. 

The TI-83 and 83+ do not have the invT command. (The TI-89 has an inverse T command.) 

A probability table for the Student's-t distribution can also be used. The table gives t-scores that correspond 
to the confidence level (column) and degrees of freedom (row). (The TI-86 does not have an invT program 
or command, so if you are using that calculator, you need to use a probability table for the Student's-t distri- 
bution.) When using t-table, note that some tables are formatted to show the confidence level in the column 
headings, while the column headings in some tables may show only corresponding area in one or both tails. 

A Student's-t table (See the Table of Contents 15. Tables) gives t-scores given the degrees of free- 
dom and the right-tailed probability. The table is very limited. Calculators and computers can easily 
calculate any Student's-t probabilities. 

The notation for the Student's-t distribution is (using T as the random variable) is 

• T ~ tdf where df = n — 1 . 

• For example, if we have a sample of size n=20 items, then we calculate the degrees of freedom as 
df=n— 1=20— 1=19 and we write the distribution as T ~ f^g 

If the population standard deviation is not known, the error bound for a population mean is: 

• EBM = t« , - 

2 V V", 

• t « is the t-score with area to the right equal to | 

• use df — n — 1 degrees of freedom 

• s = sample standard deviation 

The format for the confidence interval is: 

(x-EBM,x + EBM). 

The TI-83, 83+ and 84 calculators have a function that calculates the confidence interval directly. To get to 

it, 

Press STAT 
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Arrow over to TESTS. 

Arrow down to 8 : TInterval and press ENTER (or just press 8). 

Example 8.7 

Suppose you do a study of acupuncture to determine how effective it is in relieving pain. 
You measure sensory rates for 15 subjects with the results given below. Use the sample data 
to construct a 95% confidence interval for the mean sensory rate for the population (assumed 
normal) from which you took the data. 

The solution is shown step-by-step and by using the TI-83, 83+ and 84+ calculators. 
8.6; 9.4; 7.9; 6.8; 8.3; 7.3; 9.2; 9.6; 8.7; 11.4; 10.3; 5.4; 8.1; 5.5; 6.9 

Solution 

• You can use technology to directly calculate the confidence interval. 

• The first solution is step-by-step (Solution A). 

• The second solution uses the Ti-83+ and Ti-84 calculators (Solution B). 

Solution A 

To find the confidence interval, you need the sample mean, x, and the EBM. 

x = 8.2267 s = 1.6722 n = 15 

df = 15 - 1 = 14 

CL = 0.95 so a = 1 - CL = 1 - 0.95 = 0.05 

2 — 0.025 tx = t.025 

The area to the right of i.025 is 0.025 and the area to the left of i.025 is 1—0.025=0.975 



t * = £.025 = 2.14 using invT(.975,14) on the TI-84+ calculator. 



EBM = t* - 1 

2 V v " 

EBM = 2.14 • (i^S \ = 0.924 

x - EBM = 8.2267 - 0.9240 = 7.3 

x + EBM = 8.2267 + 0.9240 = 9.15 

The 95% confidence interval is (7.30, 9.15). 

We estimate with 95% confidence that the true population mean sensory rate is between 7.30 and 
9.15. 

Solution B 

Using a function of the TI-83, TI-83+ or TI-84 calculators: 

Press STAT and arrow over to TESTS. 

Arrow down to 8 : TInterval and press ENTER (or you can just press 8). Arrow to Data and press 

ENTER. 

Arrow down to List and enter the list name where you put the data. 

Arrow down to Freq and enter 1. 
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Arrow down to C-level and enter .95 
Arrow down to Calculate and press ENTER. 
The 95% confidence interval is (7.3006, 9.1527) 

NOTE: When calculating the error bound, a probability table for the Student's-t distribution can 
also be used to find the value of t. The table gives t-scores that correspond to the confidence level 
(column) and degrees of freedom (row); the t-score is found where the row and column intersect 
in the table. 

**With contributions from Roberta Bloom 



8.4 Confidence Interval for a Population Proportion 4 

During an election year, we see articles in the newspaper that state confidence intervals in terms of pro- 
portions or percentages. For example, a poll for a particular candidate running for president might show 
that the candidate has 40% of the vote within 3 percentage points. Often, election polls are calculated with 
95% confidence. So, the pollsters would be 95% confident that the true proportion of voters who favored 
the candidate would be between 0.37 and 0.43 : (0.40-0.03,0.40 + 0.03). 

Investors in the stock market are interested in the true proportion of stocks that go up and down each week. 
Businesses that sell personal computers are interested in the proportion of households in the United States 
that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that 
go up or down each week and for the true proportion of households in the United States that own personal 
computers. 

The procedure to find the confidence interval, the sample size, the error bound, and the confidence level 
for a proportion is similar to that for the population mean. The formulas are different. 

How do you know you are dealing with a proportion problem? First, the underlying distribution is 
binomial. (There is no mention of a mean or average.) If X is a binomial random variable, then X ~ B (n, p) 
where n = the number of trials and p = the probability of a success. To form a proportion, take X, the 
random variable for the number of successes and divide it by n, the number of trials (or the sample size). 
The random variable P' (read "P prime") is that proportion, 

p> = x 

n 

(Sometimes the random variable is denoted as P, read "P hat".) 

When n is large and p is not close to or 1, we can use the normal distribution to approximate the binomial. 

X ~ N (n • p, yjn ■ p ■ cj) 

If we divide the random variable by n, the mean by n, and the standard deviation by n, we get a normal 
distribution of proportions with P', called the estimated proportion, as the random variable. (Recall that a 
proportion = the number of successes divided by n.) 



n \ n ' n J 




Using algebra to simplify : „ 





4 This content is available online at <http://cnx.Org/content/ml6963/l.20/>. 
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P' follows a normal distribution for proportions: P' ~ N ( p, 
The confidence interval has the form (p' — EBP, p' + EBP). 



p' = the estimated proportion of successes (p' is a point estimate for p, the true proportion) 

x = the number of successes. 

n = the size of the sample 

The error bound for a proportion is 

EBP = z* • y ^f- whereq' = 1 — p' 

This formula is similar to the error bound formula for a mean, except that the "appropriate standard devia- 
tion" is different. For a mean, when the population standard deviation is known, the appropriate standard 

deviation that we use is -y= . For a proportion, the appropriate standard deviation is y ^ , 
However, in the error bound formula, we use y ^- as the standard deviation, instead of • ' — 

However, in the error bound formula, the standard deviation is 

In the error bound formula, the sample proportions p' and q' are estimates of the unknown population 

proportions p and q. The estimated proportions p' and q' are used because p and q are not known, p' and 
q' are calculated from the data, p' is the estimated proportion of successes, q' is the estimated proportion of 
failures. 

The confidence interval can only be used if the number of successes np' and the number of failures nq' are 
both larger than 5. 

NOTE: For the normal distribution of proportions, the z-score formula is as follows. 
If P' ~ N I p, y PjfL I then the z-score formula is z = p ~ p 



HfR 



Example 8.8 

Suppose that a market research firm is hired to estimate the percent of adults living in a large 
city who have cell phones. 500 randomly selected adult residents in this city are surveyed to 
determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes - they 
own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the 
true proportion of adults residents of this city who have cell phones. 

Solution 

• You can use technology to directly calculate the confidence interval. 

• The first solution is step-by-step (Solution A). 

• The second solution uses a function of the TI-83, 83+ or 84 calculators (Solution B). 

Solution A 

Let X = the number of people in the sample who have cell phones. X is binomial. X ~ 

B(500,|l). 
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To calculate the confidence interval, you must find p' , q' , and EBP. 
n = 500 x = the number of successes = 421 

W — * — 421 _ n 047 
V — n ~ 500 — U.Otz 

p' = 0.842 is the sample proportion; this is the point estimate of the population proportion. 

q' = l-p' = 1- 0.842 = 0.158 

Since CL = 0.95, then a = 1 - CL = 1 - 0.95 = 0.05 § = 0.025. 

Then z« = z.025 = 1-96 

Use the TI-83, 83+ or 84+ calculator command invNorm(0. 975,0,1) to find z 025- Remember that the 
area to the right of z.025 is 0.025 and the area to the left of Z0.025 is 0.975. This can also be found 
using appropriate commands on other calculators, using a computer, or using a Standard Normal 
probability table. 



EBP = z« • V ^ - 1.96 • yj v 1 00 ; - 0.032 



>'■<?' = 196 . / (0-842). (0.158) 

p' - EBP = 0.842 - 0.032 = 0.81 

p' + EBP = 0.842 + 0.032 = 0.874 

The confidence interval for the true binomial population proportion is 
(p'-EBP,p' + EBP) =(0.810,0.874). 

Interpretation 

We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city 
have cell phones. 

Explanation of 95% Confidence Level 

95% of the confidence intervals constructed in this way would contain the true value for the 
population proportion of all adult residents of this city who have cell phones. 

Solution B 

Using a function of the TI-83, 83+ or 84 calculators: 

Press STAT and arrow over to TESTS. 
Arrow down to A : 1-PropZint. Press ENTER. 
Arrow down to x and enter 421. 
Arrow down to n and enter 500. 
Arrow down to C-Level and enter .95. 
Arrow down to Calculate and press ENTER. 
The confidence interval is (0.81003, 0.87397). 



Example 8.9 

For a class project, a political science student at a large university wants to estimate the percent 
of students that are registered voters. He surveys 500 students and finds that 300 are registered 
voters. Compute a 90% confidence interval for the true percent of students that are registered 
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voters and interpret the confidence interval. 



Solution 

• You can use technology to directly calculate the confidence interval. 

• The first solution is step-by-step (Solution A). 

• The second solution uses a function of the TI-83, 83+ or 84 calculators (Solution B). 

Solution A 

x = 300 and n = 500. 

„/ _ x 300 _ q 6 nn 
r n 500 u ' ouu 

q' = l-p' = 1- 0.600 = 0.400 

Since CL = 0.90, then a = 1 - CL = 1 - 0.90 = 0.10 § = 0.05. 

z« = z.05 = 1.645 

Use the TI-83, 83+ or 84+ calculator command invNorm(0. 95,0,1) to find z.05. Remember that 
the area to the right of Z.05 is 0-05 and the area to the left of z.05 is 0.95. This can also be found 
using appropriate commands on other calculators, using a computer, or using a Standard Normal 
probability table. 



EBP = z. • y/ ££- = 1.645 • yj yv - m ^o> = 0.036 



YY = 1 645 . / (0-60) -(0.40) 
f - EBP = 0.60 - 0.036 = 0.564 

f + EBP = 0.60 + 0.036 = 0.636 

The confidence interval for the true binomial population proportion is 

(p'-EBP,p' + EBP) =(0.564,0.636). 

Interpretation: 

• We estimate with 90% confidence that the true percent of all students that are registered 
voters is between 56.4% and 63.6%. 

• Alternate Wording: We estimate with 90% confidence that between 56.4% and 63.6% of ALL 
students are registered voters. 

Explanation of 90% Confidence Level 

90% of all confidence intervals constructed in this way contain the true value for the population 
percent of students that are registered voters. 

Solution B 

Using a function of the TI-83, 83+ or 84 calculators: 

Press STAT and arrow over to TESTS. 
Arrow down to A : 1-PropZint. Press ENTER. 
Arrow down to x and enter 300. 
Arrow down to n and enter 500. 
Arrow down to C-Level and enter .90. 
Arrow down to Calculate and press ENTER. 
The confidence interval is (0.564, 0.636). 
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8.4.1 Calculating the Sample Size n 

If researchers desire a specific margin of error, then they can use the error bound formula to calculate the 
required sample size. 

The error bound formula for a population proportion is 

• EBP = z« ■ J^l. 

2 V n 

• Solving for n gives you an equation for the sample size. 

Za 2 -p'q' 

• n = — 



EBP 1 

Example 8.10 

Suppose a mobile phone company wants to determine the current percentage of customers 
aged 50+ that use text messaging on their cell phone. How many customers aged 50+ should 
the company survey in order to be 90% confident that the estimated (sample) proportion is 
within 3 percentage points of the true population proportion of customers aged 50+ that use text 
messaging on their cell phone. 



Solution 

From the problem, we know that EBP=0.03 (3%=0.03) and 



z« = z 05 = 1.645 because the confidence level is 90% 



However, in order to find n , we need to know the estimated (sample) proportion p'. Remember 
that q'=l-p'. But, we do not know p' yet. Since we multiply p' and q' together, we make them both 
equal to 0.5 because p'q'= (.5)(.5)=.25 results in the largest possible product. (Try other products: 
(.6)(.4)=.24; (.3)(.7)=.21; (.2)(.8)=.16 and so on). The largest possible product gives us the largest n. 
This gives us a large enough sample so that we can be 90% confident that we are within 3 percent- 
age points of the true population proportion. To calculate the sample size n, use the formula and 
make the substitutions. 

z 2 pV . 1.645 2 (.5)(.5) rrci n 

n = E§p? g lves n = . 03 2 A -751.7 

Round the answer to the next higher value. The sample size should be 752 cell phone customers 
aged 50+ in order to be 90% confident that the estimated (sample) proportion is within 3 percent- 
age points of the true population proportion of all customers aged 50+ that use text messaging on 
their cell phone. 

**With contributions from Roberta Bloom. 
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8.5 Summary of Formulas 5 

Formula 8.1: General form of a confidence interval 

(lower value, upper value) = (point estimate — error bound, point estimate + error bound) 

Formula 8.2: To find the error bound when you know the confidence interval 

i j i ■ i j.- j. r^n t. j upper value-lower value 

error bound = upper value — point estimate OR error bound = -*-*- j 

Formula 8.3: Single Population Mean, Known Standard Deviation, Normal Distribution 
Use the Normal Distribution for Means (Section 7.2) EBM = z * • -j- 

The confidence interval has the format (x — EBM, x + EBM) . 

Formula 8.4: Single Population Mean, Unknown Standard Deviation, Student' s-t Distribution 

Use the Student' s-t Distribution with degrees of freedom df = n — 1. EBM = tx • -4= 

° 2 v« 

Formula 8.5: Single Population Proportion, Normal Distribution 

Use the Normal Distribution for a single population proportion p' ' — | 

EBP = z rv /^ p' + q' = l 

The confidence interval has the format (p' — EBP, p' + EBP). 

Formula 8.6: Point Estimates 
x is a point estimate for \i 
p' is a point estimate for p 

s is a point estimate for u 



5 This content is available online at <http://cnx.org/content/ml6973/1.8/>. 
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8.6 Practice 1: Confidence Intervals for Averages, Known Population 
Standard Deviation 6 

8.6.1 Student Learning Outcomes 

• The student will calculate confidence intervals for means when the population standard deviation is 
known. 

8.6.2 Given 

The mean age for all Foothill College students for a recent Fall term was 33.2. The population standard de- 
viation has been pretty consistent at 15. Suppose that twenty-five Winter students were randomly selected. 
The mean age for the sample was 30.4. We are interested in the true mean age for Winter Foothill College 
students, (http: / /research. fhda.edu/factbook/FH_Demo_Trends/FoothillDemographicTrends. htm 7 

Let X = the age of a Winter Foothill College student 

8.6.3 Calculating the Confidence Interval 

Exercise 8.6.1 (Solution on p. 355.) 

x = 

Exercise 8.6.2 (Solution on p. 355.) 

n= 

Exercise 8.6.3 (Solution on p. 355.) 

15= (insert symbol here) 

Exercise 8.6.4 (Solution on p. 355.) 

Define the Random Variable, X, in words. 

X = 

Exercise 8.6.5 (Solution on p. 355.) 

What is x estimating? 

Exercise 8.6.6 (Solution on p. 355.) 

Is C x known? 

Exercise 8.6.7 (Solution on p. 355.) 

As a result of your answer to (4), state the exact distribution to use when calculating the Confi- 
dence Interval. 



8.6.4 Explaining the Confidence Interval 

Construct a 95% Confidence Interval for the true mean age of Winter Foothill College students. 

Exercise 8.6.8 (Solution on p. 355.) 

How much area is in both tails (combined)? a. = 

Exercise 8.6.9 (Solution on p. 355.) 

How much area is in each tail? | = 



Exercise 8.6.10 (Solution on p. 355.) 

Identify the following specifications: 



6 This content is available online at <http://cnx.Org/content/ml6970/l.13/>. 
7 http://research.£hda.edu/factbook/FH_Demo_Trends/FoothillDemographicTrends.htm 
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a. lower limit = 

b. upper limit = 

c. error bound = 



Exercise 8.6.11 

The 95% Confidence Interval is:. 

Exercise 8.6.12 



(Solution on p. 355.) 



Fill in the blanks on the graph with the areas, upper and lower limits of the Confidence Interval, and 

the sample mean. 



a 

-■ 



C.L. 



a 
j 




X 



Figure 8.2 



Exercise 8.6.13 

In one complete sentence, explain what the interval means. 



8.6.5 Discussion Questions 

Exercise 8.6.14 

Using the same mean, standard deviation and level of confidence, suppose that n were 69 instead 
of 25. Would the error bound become larger or smaller? How do you know? 

Exercise 8.6.15 

Using the same mean, standard deviation and sample size, how would the error bound change if 
the confidence level were reduced to 90%? Why? 
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8.7 Practice 2: Confidence Intervals for Averages, Unknown Population 
Standard Deviation 8 

8.7.1 Student Learning Outcomes 

• The student will calculate confidence intervals for means when the population standard deviation is 
unknown. 



8.7.2 Given 

The following real data are the result of a random survey of 39 national flags (with replacement between 
picks) from various countries. We are interested in finding a confidence interval for the true mean number 
of colors on a national flag. Let X = the number of colors on a national flag. 



X 


Freq. 


1 


1 


2 


7 


3 


18 


4 


7 


5 


6 



Table 8.1 



8.7.3 Calculating the Confidence Interval 

Exercise 8.7.1 
Calculate the following: 

a. x = 

b. s x = 

c. n = 



(Solution on p. 355.) 



Exercise 8.7.2 

Define the Random Variable, X, in words. X = 

Exercise 8.7.3 

What is x estimating? 

Exercise 8.7.4 
Is C x known? 

Exercise 8.7.5 (Solution on p. 355.) 

As a result of your answer to (4), state the exact distribution to use when calculating the Confi- 
dence Interval. 



(Solution on p. 355.) 
(Solution on p. 355.) 
(Solution on p. 355.) 



8 This content is available online at <http://cnx.Org/content/ml6971/l.14/>. 
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8.7.4 Confidence Interval for the True Mean Number 

Construct a 95% Confidence Interval for the true mean number of colors on national flags. 

Exercise 8.7.6 (Solution on p. 355.) 

How much area is in both tails (combined)? a. = 

Exercise 8.7.7 (Solution on p. 355.) 

How much area is in each tail? | = 

Exercise 8.7.8 (Solution on p. 355.) 

Calculate the following: 

a. lower limit = 

b. upper limit = 

c. error bound = 

Exercise 8.7.9 (Solution on p. 356.) 

The 95% Confidence Interval is: 

Exercise 8.7.10 

Fill in the blanks on the graph with the areas, upper and lower limits of the Confidence Interval 
and the sample mean. 

* = CL= * = 

2 2 




X 



Figure 8.3 



Exercise 8.7.11 

In one complete sentence, explain what the interval means. 



8.7.5 Discussion Questions 

Exercise 8.7.12 

Using the same x, s x , and level of confidence, suppose that n were 69 instead of 39. Would the 
error bound become larger or smaller? How do you know? 

Exercise 8.7.13 

Using the same x, s x , and n — 39, how would the error bound change if the confidence level were 

reduced to 90%? Why? 
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8.8 Practice 3: Confidence Intervals for Proportions 9 
8.8.1 Student Learning Outcomes 

• The student will calculate confidence intervals for proportions. 



8.8.2 Given 

The Ice Chalet offers dozens of different beginning ice-skating classes. All of the class names are put into a 
bucket. The 5 P.M., Monday night, ages 8 - 12, beginning ice-skating class was picked. In that class were 64 
girls and 16 boys. Suppose that we are interested in the true proportion of girls, ages 8 - 12, in all beginning 
ice-skating classes at the Ice Chalet. Assume that the children in the selected class is a random sample of 
the population. 

8.8.3 Estimated Distribution 

Exercise 8.8.1 

What is being counted? 

Exercise 8.8.2 (Solution on p. 356.) 

In words, define the Random Variable X. X — 

Exercise 8.8.3 (Solution on p. 356.) 

Calculate the following: 

a. x = 

b. n = 

c. p' = 

Exercise 8.8.4 (Solution on p. 356.) 

State the estimated distribution of X. X ~ 

Exercise 8.8.5 (Solution on p. 356.) 

Define a new Random Variable P' . What is p' estimating? 

Exercise 8.8.6 (Solution on p. 356.) 

In words, define the Random Variable P' . P' — 

Exercise 8.8.7 

State the estimated distribution of P' '. P' ~ 



8.8.4 Explaining the Confidence Interval 

Construct a 92% Confidence Interval for the true proportion of girls in the age 8-12 beginning ice-skating 
classes at the Ice Chalet. 

Exercise 8.8.8 (Solution on p. 356.) 

How much area is in both tails (combined)? a = 

Exercise 8.8.9 (Solution on p. 356.) 

How much area is in each tail? j = 

Exercise 8.8.10 (Solution on p. 356.) 

Calculate the following: 



9 This content is available online at <http://cnx.Org/content/ml6968/l.13/>. 
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a. lower limit = 

b. upper limit = 

c. error bound = 



Exercise 8.8.11 

The 92% Confidence Interval is: 

Exercise 8.8.12 



(Solution on p. 356.) 



Fill in the blanks on the graph with the areas, upper and lower limits of the Confidence Interval, and 

the sample proportion. 



a 



C.L.= 



a 




P 



Figure 8.4 



Exercise 8.8.13 

In one complete sentence, explain what the interval means. 



8.8.5 Discussion Questions 

Exercise 8.8.14 

Using the same p' and level of confidence, suppose that n were increased to 100. Would the error 
bound become larger or smaller? How do you know? 

Exercise 8.8.15 

Using the same p' and n = 80, how would the error bound change if the confidence level were 
increased to 98%? Why? 

Exercise 8.8.16 

If you decreased the allowable error bound, why would the minimum sample size increase (keep- 
ing the same level of confidence)? 
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8.9 Homework 10 

NOTE: If you are using a student's-t distribution for a homework problem below, you may assume 
that the underlying population is normally distributed. (In general, you must first prove that 
assumption, though.) 

Exercise 8.9.1 (Solution on p. 356.) 

Among various ethnic groups, the standard deviation of heights is known to be approximately 3 
inches. We wish to construct a 95% confidence interval for the mean height of male Swedes. 48 
male Swedes are surveyed. The sample mean is 71 inches. The sample standard deviation is 2.8 
inches. 



1. 


x — 




ii. a = 




iii. s x = 




iv. n = 




v. n — 1 = 



b. Define the Random Variables X and X, in words. 

c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 95% confidence interval for the population mean height of male Swedes. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. What will happen to the level of confidence obtained if 1000 male Swedes are surveyed instead 

of 48? Why? 

Exercise 8.9.2 

In six packages of "The Flintstones® Real Fruit Snacks" there were 5 Bam-Bam snack pieces. The 
total number of snack pieces in the six bags was 68. We wish to calculate a 96% confidence interval 
for the population proportion of Bam-Bam snack pieces. 

a. Define the Random Variables X and P', in words. 

b. Which distribution should you use for this problem? Explain your choice 

c. Calculate p'. 

d. Construct a 96% confidence interval for the population proportion of Bam-Bam snack pieces 

per bag. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. Do you think that six packages of fruit snacks yield enough data to give accurate results? Why 

or why not? 

Exercise 8.9.3 (Solution on p. 356.) 

A random survey of enrollment at 35 community colleges across the United States yielded the 
following figures (source: Microsoft Bookshelf): 6414; 1550; 2109; 9350; 21828; 4300; 5944; 5722; 
2825; 2044; 5481; 5200; 5853; 2750; 10012; 6357; 27000; 9414; 7681; 3200; 17500; 9200; 7380; 18314; 
6557; 13713; 17768; 7493; 2771; 2861; 1263; 7285; 28165; 5080; 11622. Assume the underlying 
population is normal. 

a. i. x — 



"This content is available online at <http://cnx.Org/content/ml6966/l.16/>. 
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ii- s x = 

iii. n = 

iv. n — 1 = 



b. Define the Random Variables X and X, in words. 

c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 95% confidence interval for the population mean enrollment at community colleges 

in the United States. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. What will happen to the error bound and confidence interval if 500 community colleges were 

surveyed? Why? 

Exercise 8.9.4 

From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering confer- 
ences were randomly picked. The mean length of the conferences was 3.94 days, with a standard 
deviation of 1.28 days. Assume the underlying population is normal. 

a. Define the Random Variables X and X, in words. 

b. Which distribution should you use for this problem? Explain your choice. 

c. Construct a 95% confidence interval for the population mean length of engineering conferences. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

Exercise 8.9.5 (Solution on p. 357.) 

Suppose that a committee is studying whether or not there is waste of time in our judicial system. 
It is interested in the mean amount of time individuals waste at the courthouse waiting to be called 
for service. The committee randomly surveyed 81 people. The sample mean was 8 hours with a 
sample standard deviation of 4 hours. 

a. i. x — 



ii. 


Sx 

n 
n - 


— 


iii. 


— 


iv. 


-1 = 



b. Define the Random Variables X and X, in words. 

c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 95% confidence interval for the population mean time wasted. 

a. State the confidence interval. 

b. Sketch the graph. 

c. Calculate the error bound. 

e. Explain in a complete sentence what the confidence interval means. 

Exercise 8.9.6 

Suppose that an accounting firm does a study to determine the time needed to complete one per- 
son's tax forms. It randomly surveys 100 people. The sample mean is 23.6 hours. There is a known 
standard deviation of 7.0 hours. The population distribution is assumed to be normal. 



a. l. x 



n. a ■ 
iii. s. 
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iv. n = _ 
v. n — 1 



b. Define the Random Variables X and X, in words. 

c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 90% confidence interval for the population mean time to complete the tax forms. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. If the firm wished to increase its level of confidence and keep the error bound the same by 

taking another survey, what changes should it make? 

f. If the firm did another survey, kept the error bound the same, and only surveyed 49 people, 

what would happen to the level of confidence? Why? 

g. Suppose that the firm decided that it needed to be at least 96% confident of the population 

mean length of time to within 1 hour. How would the number of people the firm surveys 
change? Why? 

Exercise 8.9.7 (Solution on p. 357.) 

A sample of 16 small bags of the same brand of candies was selected. Assume that the population 
distribution of bag weights is normal. The weight of each bag was then recorded. The mean 
weight was 2 ounces with a standard deviation of 0.12 ounces. The population standard deviation 
is known to be 0.1 ounce. 

a. i. x — 



ii. 


a = 
■ s x 

n - 
n — 




iii. 


— 


iv. 




v. 


1 = 



b. Define the Random Variable X, in words. 

c. Define the Random Variable X, in words. 

d. Which distribution should you use for this problem? Explain your choice. 

e. Construct a 90% confidence interval for the population mean weight of the candies. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

f. Construct a 98% confidence interval for the population mean weight of the candies. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

g. In complete sentences, explain why the confidence interval in (f) is larger than the confidence 

interval in (e). 
h. In complete sentences, give an interpretation of what the interval in (f) means. 

Exercise 8.9.8 

A pharmaceutical company makes tranquilizers. It is assumed that the distribution for the length 
of time they last is approximately normal. Researchers in a hospital used the drug on a random 
sample of 9 patients. The effective period of the tranquilizer for each patient (in hours) was as 
follows: 2.7; 2.8; 3.0; 2.3; 2.3; 2.2; 2.8; 2.1; and 2.4 . 

a. i. x — 

II- Sr = 
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b. Define the Random Variable X, in words. 

c. Define the Random Variable X, in words. 

d. Which distribution should you use for this problem? Explain your choice. 

e. Construct a 95% confidence interval for the population mean length of time. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

f. What does it mean to be "95% confident" in this problem? 

Exercise 8.9.9 (Solution on p. 357.) 

Suppose that 14 children were surveyed to determine how long they had to use training wheels. 
It was revealed that they used them an average of 6 months with a sample standard deviation of 
3 months. Assume that the underlying population distribution is normal. 

a. i. x — 

ii. s r = 



in. n = 

iv. n — 1 = 



b. Define the Random Variable X, in words. 

c. Define the Random Variable X, in words. 

d. Which distribution should you use for this problem? Explain your choice. 

e. Construct a 99% confidence interval for the population mean length of time using training 

wheels. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

f. Why would the error bound change if the confidence level was lowered to 90%? 

Exercise 8.9.10 

Insurance companies are interested in knowing the population percent of drivers who always 
buckle up before riding in a car. 

a. When designing a study to determine this population proportion, what is the minimum num- 

ber you would need to survey to be 95% confident that the population proportion is esti- 
mated to within 0.03? 

b. If it was later determined that it was important to be more than 95% confident and a new survey 

was commissioned, how would that affect the minimum number you would need to survey? 
Why? 

Exercise 8.9.11 (Solution on p. 357.) 

Suppose that the insurance companies did do a survey. They randomly surveyed 400 drivers and 
found that 320 claimed to always buckle up. We are interested in the population proportion of 
drivers who claim to always buckle up. 



a. i. 



n. n 
iii. p 



b. Define the Random Variables X and P', in words. 

c. Which distribution should you use for this problem? Explain your choice. 
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d. Construct a 95% confidence interval for the population proportion that claim to always buckle 

up. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. If this survey were done by telephone, list 3 difficulties the companies might have in obtaining 

random results. 

Exercise 8.9.12 

Unoccupied seats on flights cause airlines to lose revenue. Suppose a large airline wants to esti- 
mate its mean number of unoccupied seats per flight over the past year. To accomplish this, the 
records of 225 flights are randomly selected and the number of unoccupied seats is noted for each 
of the sampled flights. The sample mean is 11.6 seats and the sample standard deviation is 4.1 
seats. 

a. i. x — 

ii. s r = 



in. n = _ 
iv. n — 1 



b. Define the Random Variables X and X, in words. 

c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 92% confidence interval for the population mean number of unoccupied seats per 

flight. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

Exercise 8.9.13 (Solution on p. 357.) 

According to a recent survey of 1200 people, 61% feel that the president is doing an acceptable 
job. We are interested in the population proportion of people who feel the president is doing an 
acceptable job. 

a. Define the Random Variables X and P', in words. 

b. Which distribution should you use for this problem? Explain your choice. 

c. Construct a 90% confidence interval for the population proportion of people who feel the pres- 

ident is doing an acceptable job. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

Exercise 8.9.14 

A survey of the mean amount of cents off that coupons give was done by randomly surveying one 
coupon per page from the coupon sections of a recent San Jose Mercury News. The following data 
were collected: 20<2; 75f, 50<£; 65 C; 30<£; 55<£; 40<£; 40<£; 30<£; 55<£; $1.50; 40<£; 65<£; 40<£. Assume the 
underlying distribution is approximately normal. 

a. i. x — 

ii. s r = 



in. n = _ 
iv. n — 1 



b. Define the Random Variables X and X, in words. 



340 CHAPTER 8. CONFIDENCE INTERVALS 

c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 95% confidence interval for the population mean worth of coupons. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. If many random samples were taken of size 14, what percent of the confident intervals con- 

structed should contain the population mean worth of coupons? Explain why. 

Exercise 8.9.15 (Solution on p. 358.) 

An article regarding interracial dating and marriage recently appeared in the Washington Post. Of 
the 1709 randomly selected adults, 315 identified themselves as Latinos, 323 identified themselves 
as blacks, 254 identified themselves as Asians, and 779 identified themselves as whites. In this 
survey, 86% of blacks said that their families would welcome a white person into their families. 
Among Asians, 77% would welcome a white person into their families, 71% would welcome a 
Latino, and 66% would welcome a black person. 

a. We are interested in finding the 95% confidence interval for the percent of all black families that 

would welcome a white person into their families. Define the Random Variables X and P', 
in words. 

b. Which distribution should you use for this problem? Explain your choice. 

c. Construct a 95% confidence interval 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

Exercise 8.9.16 

Refer to the problem above. 

a. Construct three 95% confidence intervals. 

i: Percent of all Asians that would welcome a white person into their families. 

ii: Percent of all Asians that would welcome a Latino into their families. 

iii: Percent of all Asians that would welcome a black person into their families. 

b. Even though the three point estimates are different, do any of the confidence intervals overlap? 

Which? 

c. For any intervals that do overlap, in words, what does this imply about the significance of the 

differences in the true proportions? 

d. For any intervals that do not overlap, in words, what does this imply about the significance of 

the differences in the true proportions? 

Exercise 8.9.17 (Solution on p. 358.) 

A camp director is interested in the mean number of letters each child sends during his/her camp 
session. The population standard deviation is known to be 2.5. A survey of 20 campers is taken. 
The mean from the sample is 7.9 with a sample standard deviation of 2.8. 

a. i. ~~ 



X = 






ii. 


a = 
■ s x 

n - 
n — 




iii, 


— 


iv. 




v. 


1 = 



b. Define the Random Variables X and X, in words. 
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c. Which distribution should you use for this problem? Explain your choice. 

d. Construct a 90% confidence interval for the population mean number of letters campers send 

home. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

e. What will happen to the error bound and confidence interval if 500 campers are surveyed? 

Why? 

Exercise 8.9.18 

Stanford University conducted a study of whether running is healthy for men and women over 
age 50. During the first eight years of the study 1.5% of the 451 members of the 50-Plus Fitness 
Association died. We are interested in the proportion of people over 50 who ran and died in the 
same eight-year period. 

a. Define the Random Variables X and P', in words. 

b. Which distribution should you use for this problem? Explain your choice. 

c. Construct a 97% confidence interval for the population proportion of people over 50 who ran 

and died in the same eight-year period. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

d. Explain what a "97% confidence interval" means for this study. 

Exercise 8.9.19 (Solution on p. 358.) 

In a recent sample of 84 used cars sales costs, the sample mean was $6425 with a standard deviation 
of $3156. Assume the underlying distribution is approximately normal. 

a. Which distribution should you use for this problem? Explain your choice. 

b. Define the Random Variable X, in words. 

c. Construct a 95% confidence interval for the population mean cost of a used car. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

d. Explain what a "95% confidence interval" means for this study. 

Exercise 8.9.20 

A telephone poll of 1000 adult Americans was reported in an issue of Time Magazine. One of the 
questions asked was "What is the main problem facing the country?" 20% answered "crime". We 
are interested in the population proportion of adult Americans who feel that crime is the main 
problem. 

a. Define the Random Variables X and P', in words. 

b. Which distribution should you use for this problem? Explain your choice. 

c. Construct a 95% confidence interval for the population proportion of adult Americans who feel 

that crime is the main problem. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

d. Suppose we want to lower the sampling error. What is one way to accomplish that? 
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e. The sampling error given by Yankelovich Partners, Inc. (which conducted the poll) is ± 3%. In 
1-3 complete sentences, explain what the ± 3% represents. 

Exercise 8.9.21 (Solution on p. 358.) 

Refer to the above problem. Another question in the poll was "[How much are] you worried 
about the quality of education in our schools?" 63% responded "a lot". We are interested in the 
population proportion of adult Americans who are worried a lot about the quality of education in 
our schools. 

1. Define the Random Variables X and P', in words. 

2. Which distribution should you use for this problem? Explain your choice. 

3. Construct a 95% confidence interval for the population proportion of adult Americans wor- 
ried a lot about the quality of education in our schools. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

4. The sampling error given by Yankelovich Partners, Inc. (which conducted the poll) is ± 3%. 
In 1-3 complete sentences, explain what the ± 3% represents. 

Exercise 8.9.22 

Six different national brands of chocolate chip cookies were randomly selected at the supermarket. 
The grams of fat per serving are as follows: 8; 8; 10; 7; 9; 9. Assume the underlying distribution is 
approximately normal. 

a. Calculate a 90% confidence interval for the population mean grams of fat per serving of choco- 

late chip cookies sold in supermarkets. 

i. State the confidence interval. 

ii. Sketch the graph. 

iii. Calculate the error bound. 

b. If you wanted a smaller error bound while keeping the same level of confidence, what should 

have been changed in the study before it was done? 

c. Go to the store and record the grams of fat per serving of six brands of chocolate chip cookies. 

d. Calculate the mean. 

e. Is the mean within the interval you calculated in part (a)? Did you expect it to be? Why or why 

not? 

Exercise 8.9.23 

A confidence interval for a proportion is given to be (- 0.22, 0.34). Why doesn't the lower limit of 

the confidence interval make practical sense? How should it be changed? Why? 

8.9.1 Try these multiple choice questions. 

The next three problems refer to the following: According to a Field Poll, 79% of California adults 
(actual results are 400 out of 506 surveyed) feel that "education and our schools" is one of the top is- 
sues facing California. We wish to construct a 90% confidence interval for the true proportion of Cali- 
fornia adults who feel that education and the schools is one of the top issues facing California. (Source: 
http : / / field .com / f ieldpollonline / subscribers / ) 

Exercise 8.9.24 (Solution on p. 358.) 

A point estimate for the true population proportion is: 
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A. 0.90 

B. 1.27 

C. 0.79 

D. 400 

Exercise 8.9.25 (Solution on p. 358.) 

A 90% confidence interval for the population proportion is: 

A. (0.761,0.820) 

B. (0.125,0.188) 

C. (0.755,0.826) 

D. (0.130,0.183) 

Exercise 8.9.26 (Solution on p. 358.) 

The error bound is approximately 

A. 1.581 

B. 0.791 

C. 0.059 

D. 0.030 

The next two problems refer to the following: 

A quality control specialist for a restaurant chain takes a random sample of size 12 to check the amount of 
soda served in the 16 oz. serving size. The sample mean is 13.30 with a sample standard deviation of 1.55. 
Assume the underlying population is normally distributed. 

Exercise 8.9.27 (Solution on p. 358.) 

Find the 95% Confidence Interval for the true population mean for the amount of soda served. 

A. (12.42,14.18) 

B. (12.32,14.29) 

C. (12.50,14.10) 

D. Impossible to determine 

Exercise 8.9.28 (Solution on p. 358.) 

What is the error bound? 

A. 0.87 

B. 1.98 

C. 0.99 

D. 1.74 

Exercise 8.9.29 (Solution on p. 358.) 

What is meant by the term "90% confident" when constructing a confidence interval for a mean? 

A. If we took repeated samples, approximately 90% of the samples would produce the same con- 

fidence interval. 

B. If we took repeated samples, approximately 90% of the confidence intervals calculated from 

those samples would contain the sample mean. 

C. If we took repeated samples, approximately 90% of the confidence intervals calculated from 

those samples would contain the true value of the population mean. 

D. If we took repeated samples, the sample mean would equal the population mean in approxi- 

mately 90% of the samples. 
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The next two problems refer to the following: 

Five hundred and eleven (511) homes in a certain southern California community are randomly surveyed 
to determine if they meet minimal earthquake preparedness recommendations. One hundred seventy-three 
(173) of the homes surveyed met the minimum recommendations for earthquake preparedness and 338 did 
not. 

Exercise 8.9.30 (Solution on p. 358.) 

Find the Confidence Interval at the 90% Confidence Level for the true population proportion of 
southern California community homes meeting at least the minimum recommendations for earth- 
quake preparedness. 

A. (0.2975,0.3796) 

B. (0.6270,6959) 

C. (0.3041,0.3730) 

D. (0.6204,0.7025) 

Exercise 8.9.31 (Solution on p. 358.) 

The point estimate for the population proportion of homes that do not meet the minimum recom- 
mendations for earthquake preparedness is: 

A. 0.6614 

B. 0.3386 

C. 173 

D. 338 
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8.10 Review 11 

The next three problems refer to the following situation: Suppose that a sample of 15 randomly chosen 
people were put on a special weight loss diet. The amount of weight lost, in pounds, follows an unknown 
distribution with mean equal to 12 pounds and standard deviation equal to 3 pounds. Assume that the 
distribution for the weight loss is normal. 

Exercise 8.10.1 (Solution on p. 359.) 

To find the probability that the mean amount of weight lost by 15 people is no more than 14 
pounds, the random variable should be: 

A. The number of people who lost weight on the special weight loss diet 

B. The number of people who were on the diet 

C. The mean amount of weight lost by 15 people on the special weight loss diet 

D. The total amount of weight lost by 15 people on the special weight loss diet 

Exercise 8.10.2 (Solution on p. 359.) 

Find the probability asked for in the previous problem. 

Exercise 8.10.3 (Solution on p. 359.) 

Find the 90th percentile for the mean amount of weight lost by 15 people. 

The next three questions refer to the following situation: The time of occurrence of the first accident 
during rush-hour traffic at a major intersection is uniformly distributed between the three hour interval 4 
p.m. to 7 p.m. Let X = the amount of time (hours) it takes for the first accident to occur. 

• So, if an accident occurs at 4 p.m., the amount of time, in hours, it took for the accident to occur is 



. a 2 



Exercise 8.10.4 (Solution on p. 359.) 

What is the probability that the time of occurrence is within the first half -hour or the last hour of 
the period from 4 to 7 p.m.? 

A. Cannot be determined from the information given 

B. I 

C. \ 

M 

Exercise 8.10.5 (Solution on p. 359.) 

The 20th percentile occurs after how many hours? 

A. 0.20 

B. 0.60 

C. 0.50 

D. 1 

Exercise 8.10.6 (Solution on p. 359.) 

Assume Ramon has kept track of the times for the first accidents to occur for 40 different days. Let 
C = the total cumulative time. Then C follows which distribution? 

A. (J (0,3) 



lr rhis content is available online at <http://cnx.Org/content/ml6972/l.10/>. 



346 CHAPTER 8. CONFIDENCE INTERVALS 

B. Exp (Y 

C. N (60, 5.477) 

D. N (1.5,0.01875) 

Exercise 8.10.7 (Solution on p. 359.) 

Using the information in question #6, find the probability that the total time for all first accidents 
to occur is more than 43 hours. 

The next two questions refer to the following situation: The length of time a parent must wait for his 
children to clean their rooms is uniformly distributed in the time interval from 1 to 15 days. 

Exercise 8.10.8 (Solution on p. 359.) 

How long must a parent expect to wait for his children to clean their rooms? 

A. 8 days 

B. 3 days 

C. 14 days 

D. 6 days 

Exercise 8.10.9 (Solution on p. 359.) 

What is the probability that a parent will wait more than 6 days given that the parent has already 
waited more than 3 days? 

A. 0.5174 

B. 0.0174 

C. 0.7500 

D. 0.2143 

The next five problems refer to the following study: Twenty percent of the students at a local community 
college live in within five miles of the campus. Thirty percent of the students at the same community college 
receive some kind of financial aid. Of those who live within five miles of the campus, 75% receive some 
kind of financial aid. 

Exercise 8.10.10 (Solution on p. 359.) 

Find the probability that a randomly chosen student at the local community college does not live 
within five miles of the campus. 

A. 80% 

B. 20% 

C. 30% 

D. Cannot be determined 

Exercise 8.10.11 (Solution on p. 359.) 

Find the probability that a randomly chosen student at the local community college lives within 
five miles of the campus or receives some kind of financial aid. 

A. 50% 

B. 35% 

C. 27.5% 

D. 75% 

Exercise 8.10.12 (Solution on p. 359.) 

Based upon the above information, are living in student housing within five miles of the campus 
and receiving some kind of financial aid mutually exclusive? 
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A. Yes 

B. No 

C. Cannot be determined 

Exercise 8.10.13 (Solution on p. 359.) 

The interest rate charged on the financial aid is data. 

A. quantitative discrete 

B. quantitative continuous 

C. qualitative discrete 

D. qualitative 

Exercise 8.10.14 (Solution on p. 359.) 

What follows is information about the students who receive financial aid at the local community 
college. 

• 1st quartile = $250 

• 2nd quartile = $700 

• 3rd quartile = $1200 

(These amounts are for the school year.) If a sample of 200 students is taken, how many are 
expected to receive $250 or more? 

A. 50 

B. 250 

C. 150 

D. Cannot be determined 

The next two problems refer to the following information: P (A) = 0.2 , P (B) — 0.3 , A and B are 

independent events. 

Exercise 8.10.15 (Solution on p. 359.) 

P(AANDB) = 



A. 


0.5 


B. 


0.6 


C. 





D. 


. 0.06 


Exercise 8.10.16 


Pi 


(A ORB) = 


A. 


0.56 


B. 


0.5 


C. 


0.44 


D. 


, 1 



(Solution on p. 359.) 



Exercise 8.10.17 (Solution on p. 359.) 

It H and D are mutually exclusive events, P (H) = 0.25 , P (D) = 0.15 , then P (H\D) 

A. 1 

B. 

C. 0.40 

D. 0.0375 



348 CHAPTER 8. CONFIDENCE INTERVALS 

8.11 Lab 1: Confidence Interval (Home Costs) 12 

Class Time: 
Names: 

8.11.1 Student Learning Outcomes: 

• The student will calculate the 90% confidence interval for the mean cost of a home in the area in which 
this school is located. 

• The student will interpret confidence intervals. 

• The student will determine the effects that changing conditions has on the confidence interval. 



8.11.2 Collect the Data 

Check the Real Estate section in your local newspaper. (Note: many papers only list them one day per 
week. Also, we will assume that homes come up for sale randomly.) Record the sales prices for 35 randomly 
selected homes recently listed in the county. 

1. Complete the table: 



Table 8.2 



8.11.3 Describe the Data 

1. Compute the following: 

a. x = 

b. s x = 

c. n = 

2. Define the Random Variable X, in words. X = 

3. State the estimated distribution to use. Use both words and symbols. 



2 This content is available online at <http://cnx.Org/content/ml6960/l.ll/>. 
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8.11.4 Find the Confidence Interval 

1. Calculate the confidence interval and the error bound. 

a. Confidence Interval: 

b. Error Bound: 

2. How much area is in both tails (combined)? a = 

3. How much area is in each tail? | = 

4. Fill in the blanks on the graph with the area in each section. Then, fill in the number line with the 
upper and lower limits of the confidence interval and the sample mean. 



a 



C!.= 



a 
j 




X 



Figure 8.5 



5. Some students think that a 90% confidence interval contains 90% of the data. Use the list of data on 
the first page and count how many of the data values lie within the confidence interval. What percent 
is this? Is this percent close to 90%? Explain why this percent should or should not be close to 90%. 



8.11.5 Describe the Confidence Interval 

1. In two to three complete sentences, explain what a Confidence Interval means (in general), as if you 
were talking to someone who has not taken statistics. 

2. In one to two complete sentences, explain what this Confidence Interval means for this particular 
study. 



8.11.6 Use the Data to Construct Confidence Intervals 

1. Using the above information, construct a confidence interval for each confidence level 
given. 



Confidence level 


EBM / Error Bound 


Confidence Interval 


50% 






80% 






95% 






99% 







Table 8.3 
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2. What happens to the EBM as the confidence level increases? Does the width of the confidence interval 
increase or decrease? Explain why this happens. 
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8.12 Lab 2: Confidence Interval (Place of Birth) 13 

Class Time: 
Names: 

8.12.1 Student Learning Outcomes: 

• The student will calculate the 90% confidence interval for proportion of students in this school that 
were born in this state. 

• The student will interpret confidence intervals. 

• The student will determine the effects that changing conditions have on the confidence interval. 

8.12.2 Collect the Data 

1. Survey the students in your class, asking them if they were born in this state. Let X = the number that 
were born in this state. 

a. n = 

b. x= 



2. Define the Random Variable P' in words. 

3. State the estimated distribution to use. 



8.12.3 Find the Confidence Interval and Error Bound 

1. Calculate the confidence interval and the error bound. 

a. Confidence Interval: 

b. Error Bound: 

2. How much area is in both tails (combined)? cc- 

3. How much area is in each tail? | = 



4. Fill in the blanks on the graph with the area in each section. Then, fill in the number line with the 
upper and lower limits of the confidence interval and the sample proportion. 



a 




P' 



Figure 8.6 



3 This content is available online at <http://cnx.Org/content/ml6961/l.ll/>. 
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8.12.4 Describe the Confidence Interval 

1. In two to three complete sentences, explain what a Confidence Interval means (in general), as if you 
were talking to someone who has not taken statistics. 

2. In one to two complete sentences, explain what this Confidence Interval means for this particular 
study. 

3. Using the above information, construct a confidence interval for each given confidence level 
given. 



Confidence level 


EBP / Error Bound 


Confidence Interval 


50% 






80% 






95% 






99% 







Table 8.4 



4. What happens to the EBP as the confidence level increases? Does the width of the confidence interval 
increase or decrease? Explain why this happens. 
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8.13 Lab 3: Confidence Interval (Womens' Heights) 

Class Time: 
Names: 



14 



8.13.1 Student Learning Outcomes: 

• The student will calculate a 90% confidence interval using the given data. 

• The student will determine the relationship between the confidence level and the percent of con- 
structed intervals that contain the population mean. 



8.13.2 Given: 



1. Heights of 100 Women (in Inches) 



59.4 


71.6 


69.3 


65.0 


62.9 


66.5 


61.7 


55.2 


67.5 


67.2 


63.8 


62.9 


63.0 


63.9 


68.7 


65.5 


61.9 


69.6 


58.7 


63.4 


61.8 


60.6 


69.8 


60.0 


64.9 


66.1 


66.8 


60.6 


65.6 


63.8 


61.3 


59.2 


64.1 


59.3 


64.9 


62.4 


63.5 


60.9 


63.3 


66.3 


61.5 


64.3 


62.9 


60.6 


63.8 


58.8 


64.9 


65.7 


62.5 


70.9 


62.9 


63.1 


62.2 


58.7 


64.7 


66.0 


60.5 


64.7 


65.4 


60.2 


65.0 


64.1 


61.1 


65.3 


64.6 


59.2 


61.4 


62.0 


63.5 


61.4 


65.5 


62.3 


65.5 


64.7 


58.8 


66.1 


64.9 


66.9 


57.9 


69.8 


58.5 


63.4 


69.2 


65.9 


62.2 


60.0 


58.1 


62.5 


62.4 


59.1 


66.4 


61.2 


60.4 


58.7 


66.7 


67.5 


63.2 


56.6 


67.7 


62.5 



Table 8.5 

Listed above are the heights of 100 women. Use a random number generator to randomly select 10 
data values. 



14 This content is available online at <http://cnx.Org/content/ml6964/l.12/>. 
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Calculate the sample mean and sample standard deviation. Assume that the population standard 
deviation is known to be 3.3 inches. With these values, construct a 90% confidence interval for your 
sample of 10 values. Write the confidence interval you obtained in the first space of the table below. 
Now write your confidence interval on the board. As others in the class write their confidence inter- 
vals on the board, copy them into the table below: 

90% Confidence Intervals 



Table 8.6 



8.13.3 Discussion Questions 

1. The actual population mean for the 100 heights given above is pi = 63 A. Using the class listing of 
confidence intervals, count how many of them contain the population mean pi; i.e., for how many 
intervals does the value of pi lie between the endpoints of the confidence interval? 

2. Divide this number by the total number of confidence intervals generated by the class to determine 
the percent of confidence intervals that contains the mean pi. Write this percent below. 

3. Is the percent of confidence intervals that contain the population mean pi close to 90%? 

4. Suppose we had generated 100 confidence intervals. What do you think would happen to the percent 
of confidence intervals that contained the population mean? 

5. When we construct a 90% confidence interval, we say that we are 90% confident that the true popu- 
lation mean lies within the confidence interval. Using complete sentences, explain what we mean 
by this phrase. 

6. Some students think that a 90% confidence interval contains 90% of the data. Use the list of data given 
(the heights of women) and count how many of the data values lie within the confidence interval that 
you generated on that page. How many of the 100 data values lie within your confidence interval? 
What percent is this? Is this percent close to 90%? 

7. Explain why it does not make sense to count data values that lie in a confidence interval. Think about 
the random variable that is being used in the problem. 

8. Suppose you obtained the heights of 10 women and calculated a confidence interval from this infor- 
mation. Without knowing the population mean pi, would you have any way of knowing for certain 
if your interval actually contained the value of pi? Explain. 



NOTE: This lab was designed and contributed by Diane Mathios. 
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Solutions to Exercises in Chapter 8 

Solutions to Practice 1: Confidence Intervals for Averages, Known Population Stan- 
dard Deviation 

Solution to Exercise 8.6.1 (p. 329) 
30.4 

Solution to Exercise 8.6.2 (p. 329) 
25 

Solution to Exercise 8.6.3 (p. 329) 
a 

Solution to Exercise 8.6.4 (p. 329) 

the mean age of 25 randomly selected Winter Foothill students 
Solution to Exercise 8.6.5 (p. 329) 

V- 
Solution to Exercise 8.6.6 (p. 329) 

yes 

Solution to Exercise 8.6.7 (p. 329) 

Normal 

Solution to Exercise 8.6.8 (p. 329) 

0.05 

Solution to Exercise 8.6.9 (p. 329) 

0.025 

Solution to Exercise 8.6.10 (p. 329) 

a. 24.52 

b. 36.28 

c. 5.88 

Solution to Exercise 8.6.11 (p. 330) 

(24.52,36.28) 

Solutions to Practice 2: Confidence Intervals for Averages, Unknown Population Stan- 
dard Deviation 

Solution to Exercise 8.7.1 (p. 331) 

a. 3.26 

b. 1.02 

c. 39 

Solution to Exercise 8.7.2 (p. 331) 
the mean number of colors of 39 flags 
Solution to Exercise 8.7.3 (p. 331) 

V- 
Solution to Exercise 8.7.4 (p. 331) 

No 

Solution to Exercise 8.7.5 (p. 331) 

Solution to Exercise 8.7.6 (p. 332) 
0.05 

Solution to Exercise 8.7.7 (p. 332) 
0.025 
Solution to Exercise 8.7.8 (p. 332) 
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a. 2.93 

b. 3.59 

c. 0.33 

Solution to Exercise 8.7.9 (p. 332) 

2.93; 3.59 

Solutions to Practice 3: Confidence Intervals for Proportions 

Solution to Exercise 8.8.2 (p. 333) 

The number of girls, age 8-12, in the beginning ice skating class 

Solution to Exercise 8.8.3 (p. 333) 

a. 64 

b. 80 

c. 0.8 

Solution to Exercise 8.8.4 (p. 333) 
B (80,0.80) 
Solution to Exercise 8.8.5 (p. 333) 

V 
Solution to Exercise 8.8.6 (p. 333) 

The proportion of girls, age 8-12, in the beginning ice skating class. 
Solution to Exercise 8.8.8 (p. 333) 

1 - 0.92 = 0.08 
Solution to Exercise 8.8.9 (p. 333) 

0.04 

Solution to Exercise 8.8.10 (p. 333) 

a. 0.72 

b. 0.88 

c. 0.08 

Solution to Exercise 8.8.11 (p. 334) 

(0.72; 0.88) 

Solutions to Homework 

Solution to Exercise 8.9.1 (p. 335) 

a. i. 71 
ii. 3 
iii. 2.8 
iv. 48 
v. 47 

C - N ( 71 'A) 

d. i. CI: (70.15,71.85) 

iii. EB = 0.85 

Solution to Exercise 8.9.3 (p. 335) 

a. i. 8629 

ii. 6944 
iii. 35 
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iv. 34 

c. {34 

d. i. CI: (6244, 11,014) 

iii. EB = 2385 

e. It will become smaller 

Solution to Exercise 8.9.5 (p. 336) 

a. i. 8 

ii. 4 
iii. 81 
iv. 80 

d. i. CI: (7.12, 8.88) 

iii. EB = 0.88 

Solution to Exercise 8.9.7 (p. 337) 

a. i. 2 

ii. 0.1 
iii. 0.12 
iv. 16 
v. 15 

b. the weight of 1 small bag of candies 

c. the mean weight of 16 small bags of candies 

e. i. CI: (1.96, 2.04) 

iii. EB = 0.04 

f. i. CI: (1.94, 2.06) 

iii. EB = 0.06 

Solution to Exercise 8.9.9 (p. 338) 

a. i. 6 

ii. 3 
iii. 14 
iv. 13 

b. the time for a child to remove his training wheels 

c. the mean time for 14 children to remove their training wheels. 

d. t 13 

e. i. CI: (3.58, 8.42) 

iii. EB = 2.42 

Solution to Exercise 8.9.11 (p. 338) 



a. 


i. 


320 
ii . 


400 






N 


iii. 0.80 
(0.8O,/ 




c. 


(0.80) (0.20) 
400 


d. 


i. 


CI: 


(0.76, 


0.84) 






iii. 


EB = 


0.04 



Solution to Exercise 8.9.13 (p. 339) 
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b. N 0.61, 



(0.61)(0.39) 
1200 



c. i. CI: (0.59, 0.63) 
iii. EB = 0.02 

Solution to Exercise 8.9.15 (p. 340) 



b. N 0.86, 



(0.86)(0.14) 
323 



c. i. CI: (0.823, 0.898) 
iii. EB = 0.038 

Solution to Exercise 8.9.17 (p. 340) 



a. 


i. 7.9 




ii. 2.5 




iii. 2.8 




iv. 20 




y. 19 


c. 


N ( 7 - 9 '7!) 


d. 


i. CI: (6.98, 8.82) 




iii. EB:0.92 



Solution to Exercise 8.9.19 (p. 341) 

a. *83 

b. mean cost of 84 used cars 

c. i. CI: (5740.10, 7109.90) 

iii. EB = 684.90 

Solution to Exercise 8.9.21 (p. 342) 



(0.63) (0.37) 



b. N (a63, v -^r 

c. i. CI: (0.60, 0.66) 

iii. EB = 0.03 



Solution to Exercise 8.9.24 (p. 342) 

C 

Solution to Exercise 8.9.25 (p. 343) 

A 

Solution to Exercise 8.9.26 (p. 343) 

D 

Solution to Exercise 8.9.27 (p. 343) 

B 

Solution to Exercise 8.9.28 (p. 343) 

C 

Solution to Exercise 8.9.29 (p. 343) 

C 

Solution to Exercise 8.9.30 (p. 344) 

C 

Solution to Exercise 8.9.31 (p. 344) 

A 
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Solutions to Review 

Solution to Exercise 8.10.1 (p. 345) 

C 

Solution to Exercise 8.10.2 (p. 345) 

0.9951 

Solution to Exercise 8.10.3 (p. 345) 

12.99 

Solution to Exercise 8.10.4 (p. 345) 

C 

Solution to Exercise 8.10.5 (p. 345) 

B 

Solution to Exercise 8.10.6 (p. 345) 

C 

Solution to Exercise 8.10.7 (p. 346) 

0.9990 

Solution to Exercise 8.10.8 (p. 346) 

A 

Solution to Exercise 8.10.9 (p. 346) 

C 

Solution to Exercise 8.10.10 (p. 346) 

A 

Solution to Exercise 8.10.11 (p. 346) 

B 

Solution to Exercise 8.10.12 (p. 346) 

B 

Solution to Exercise 8.10.13 (p. 347) 

B 

Solution to Exercise 8.10.14 (p. 347) 

C. 150 

Solution to Exercise 8.10.15 (p. 347) 

D 

Solution to Exercise 8.10.16 (p. 347) 

C 

Solution to Exercise 8.10.17 (p. 347) 

B 
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Chapter 9 

Hypothesis Testing: Single Mean and 
Single Proportion 

9.1 Hypothesis Testing: Single Mean and Single Proportion 1 

9.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Differentiate between Type I and Type II Errors 

• Describe hypothesis testing in general and in practice 

• Conduct and interpret hypothesis tests for a single population mean, population standard deviation 
known. 

• Conduct and interpret hypothesis tests for a single population mean, population standard deviation 
unknown. 

• Conduct and interpret hypothesis tests for a single population proportion. 

9.1.2 Introduction 

One job of a statistician is to make statistical inferences about populations based on samples taken from the 
population. Confidence intervals are one way to estimate a population parameter. Another way to make 
a statistical inference is to make a decision about a parameter. For instance, a car dealer advertises that 
its new small truck gets 35 miles per gallon, on the average. A tutoring service claims that its method of 
tutoring helps 90% of its students get an A or a B. A company says that women managers in their company 
earn an average of $60,000 per year. 

A statistician will make a decision about these claims. This process is called "hypothesis testing." A hy- 
pothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a 
decision as to whether or not there is sufficient evidence based upon analyses of the data, to reject the null 
hypothesis. 

In this chapter, you will conduct hypothesis tests on single means and single proportions. You will also 
learn about the errors associated with these tests. 

Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, 
and a conclusion. To perform a hypothesis test, a statistician will: 



1 This content is available online at <http://cnx.Org/content/ml6997/l.ll/>. 
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PROPORTION 

1. Set up two contradictory hypotheses. 

2. Collect sample data (in homework problems, the data or summary statistics will be given to you). 

3. Determine the correct distribution to perform the hypothesis test. 

4. Analyze sample data by performing the calculations that ultimately will allow you to reject or fail to 
reject the null hypothesis. 

5. Make a decision and write a meaningful conclusion. 

NOTE: To do the hypothesis test homework problems for this chapter and later chapters, make 
copies of the appropriate special solution sheets. See the Table of Contents topic "Solution Sheets". 



9.2 Null and Alternate Hypotheses 2 

The actual test begins by considering two hypotheses. They are called the null hypothesis and the alternate 
hypothesis. These hypotheses contain opposing viewpoints. 

H : The null hypothesis: It is a statement about the population that will be assumed to be true unless it 
can be shown to be incorrect beyond a reasonable doubt. 

H a : The alternate hypothesis: It is a claim about the population that is contradictory to H and what we 
conclude when we reject H . 

Example 9.1 

H : No more than 30% of the registered voters in Santa Clara County voted in the primary election. 

H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. 

Example 9.2 

We want to test whether the mean grade point average in American colleges is different from 2.0 
(out of 4.0). 

H : ji = 2.0 H a : ]i £ 2.0 

Example 9.3 

We want to test if college students take less than five years to graduate from college, on the aver- 
age. 

H : }i>5 H a : \i < 5 

Example 9.4 

In an issue of U. S. News and World Report, an article on school standards stated that about half 
of all students in France, Germany, and Israel take advanced placement exams and a third pass. 
The same article stated that 6.6% of U. S. students take advanced placement exams and 4.4 % pass. 
Test if the percentage of U. S. students who take advanced placement exams is more than 6.6%. 

H : p= 0.066 H a : p > 0.066 

Since the null and alternate hypotheses are contradictory, you must examine evidence to decide if you have 
enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data. 

After you have determined which hypothesis the sample supports, you make a decision. There are two 
options for a decision. They are "reject H " if the sample information favors the alternate hypothesis or "do 
not reject H " or "fail to reject H " if the sample information is insufficient to reject the null hypothesis. 



2 This content is available online at <http://cnx.Org/content/ml6998/l.14/>. 
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Mathematical Symbols Used in H and H a : 



H 


H a 


equal (=) 


not equal (7^) or greater than (> ) or less than (<) 


greater than or equal to (>) 


less than (<) 


less than or equal to (<) 


more than ( > ) 



Table 9.1 



NOTE: H always has a symbol with an equal in it. H a never has a symbol with an equal in it. The 
choice of symbol depends on the wording of the hypothesis test. However, be aware that many 
researchers (including one of the co-authors in research work) use = in the Null Hypothesis, even 
with > or < as the symbol in the Alternate Hypothesis. This practice is acceptable because we 
only make the decision to reject or not reject the Null Hypothesis. 



9.2.1 Optional Collaborative Classroom Activity 

Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from 
which your group can write a null and alternate hypotheses. Discuss your hypotheses with the rest of the 
class. 

9.3 Outcomes and the Type I and Type II Errors 3 

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or 
falseness) of the null hypothesis H and the decision to reject or not. The outcomes are summarized in the 
following table: 



ACTION 


H IS ACTUALLY 






True 


False 


Do not reject H 


Correct Outcome 


Type II error 


Reject H 


Type I Error 


Correct Outcome 



Table 9.2 



The four possible outcomes in the table are: 

• The decision is to not reject H when, in fact, H is true (correct decision). 

• The decision is to reject H when, in fact, H is true (incorrect decision known as a Type I error). 

• The decision is to not reject H when, in fact, H is false (incorrect decision known as a Type II error). 

• The decision is to reject H when, in fact, H is false (correct decision whose probability is called the 
Power of the Test). 

Each of the errors occurs with a particular probability. The Greek letters a and f> represent the probabilities. 

a = probability of a Type I error = P(Type I error) = probability of rejecting the null hypothesis when the 
null hypothesis is true. 



3 This content is available online at <http://cnx.Org/content/ml7006/l.8/>. 
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/3 = probability of a Type II error = P(Type II error) = probability of not rejecting the null hypothesis when 
the null hypothesis is false. 

ol and /3 should be as small as possible because they are probabilities of errors. They are rarely 0. 

The Power of the Test is 1 — /3. Ideally, we want a high power that is as close to 1 as possible. Increasing the 
sample size can increase the Power of the Test. 

The following are examples of Type I and Type II errors. 

Example 9.5 

Suppose the null hypothesis, H , is: Frank's rock climbing equipment is safe. 

Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really 
is safe. Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it 
is not safe. 

a. = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it 
really is safe. /3 = probability that Frank thinks his rock climbing equipment may be safe when, in 
fact, it is not safe. 

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks 
his rock climbing equipment is safe, he will go ahead and use it.) 

Example 9.6 

Suppose the null hypothesis, H , is: The victim of an automobile accident is alive when he arrives 
at the emergency room of a hospital. 

Type I error: The emergency crew thinks that the victim is dead when, in fact, the victim is alive. 
Type II error: The emergency crew does not know if the victim is alive when, in fact, the victim is 
dead. 

a = probability that the emergency crew thinks the victim is dead when, in fact, he is really alive 
= P(Type I error). f> = probability that the emergency crew does not know if the victim is alive 
when, in fact, the victim is dead = P(Type II error). 

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim 
is dead, they will not treat him.) 



9.4 Distribution Needed for Hypothesis Testing 4 

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with 
hypothesis testing. Perform tests of a population mean using a normal distribution or a student's-t dis- 
tribution. (Remember, use a student's-t distribution when the population standard deviation is unknown 
and the distribution of the sample mean is approximately normal.) In this chapter we perform tests of a 
population proportion using a normal distribution (usually n is large or the sample size is large). 

If you are testing a single population mean, the distribution for the test is for means: 

X~N(Vx,^) or t df 



4 This content is available online at <http://cnx.Org/content/ml7017/l.13/>. 
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The population parameter is p. The estimated value (point estimate) for p is x, the sample mean. 

If you are testing a single population proportion, the distribution for the test is for proportions or percent- 
ages: 

75 



P' ~ N I p, 

The population parameter is p. The estimated value (point estimate) for p is p' . p' — \ where x is the 
number of successes and n is the sample size. 

9.5 Assumption 5 

When you perform a hypothesis test of a single population mean p using a Student's-t distribution (often 
called a t-test), there are fundamental assumptions that need to be met in order for the test to work prop- 
erly. Your data should be a simple random sample that comes from a population that is approximately 
normally distributed. You use the sample standard deviation to approximate the population standard 
deviation. (Note that if the sample size is sufficiently large, a t-test will work even if the population is not 
approximately normally distributed). 

When you perform a hypothesis test of a single population mean p using a normal distribution (often 
called a z-test), you take a simple random sample from the population. The population you are testing 
is normally distributed or your sample size is sufficiently large. You know the value of the population 
standard deviation. 

When you perform a hypothesis test of a single population proportion p, you take a simple random 
sample from the population. You must meet the conditions for a binomial distribution which are there are 
a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has 
the same probability of a success p. The shape of the binomial distribution needs to be similar to the shape 
of the normal distribution. To ensure this, the quantities np and nq must both be greater than five (np > 5 
and nq > 5). Then the binomial distribution of sample (estimated) proportion can be approximated by the 

normal distribution with p = p and <x — J ^. Remember that q = 1 — p. 

9.6 Rare Events 6 

Suppose you make an assumption about a property of the population (this assumption is the null hypoth- 
esis). Then you gather sample data randomly. If the sample has properties that would be very unlikely 
to occur if the assumption is true, then you would conclude that your assumption about the population is 
probably incorrect. (Remember that your assumption is just an assumption - it is not a fact and it may or 
may not be true. But your sample data are real and the data are showing you a fact that seems to contradict 
your assumption.) 

For example, Didi and Ali are at a birthday party of a very wealthy friend. They hurry to be first in line 
to grab a prize from a tall basket that they cannot see inside because they will be blindfolded. There are 
200 plastic bubbles in the basket and Didi and Ali have been told that there is only one with a $100 bill. 
Didi is the first person to reach into the basket and pull out a bubble. Her bubble contains a $100 bill. The 
probability of this happening is jjjjj = 0.005. Because this is so unlikely, Ali is hoping that what the two 
of them were told is wrong and there are more $100 bills in the basket. A "rare event" has occurred (Didi 
getting the $100 bill) so Ali doubts the assumption about only one $100 bill being in the basket. 

5 This content is available online at <http://cnx.Org/content/ml7002/l.16/>. 
6 This content is available online at <http://cnx.Org/content/ml6994/l.8/>. 
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9.7 Using the Sample to Support One of the Hypotheses 7 

Use the sample data to calculate the actual probability of getting the test result, called the p-value. The 
p-value is the probability that, if the null hypothesis is true, the results from another randomly selected 
sample will be as extreme or more extreme as the results obtained from the given sample. 

A large p-value calculated from the data indicates that we should fail to reject the null hypothesis. The 
smaller the p-value, the more unlikely the outcome, and the stronger the evidence is against the null hy- 
pothesis. We would reject the null hypothesis if the evidence is strongly against it. 

Draw a graph that shows the p-value. The hypothesis test is easier to perform if you use a graph because 
you see the problem more clearly. 

Example 9.7: (to illustrate the p-value) 

Suppose a baker claims that his bread height is more than 15 cm, on the average. Several of his 
customers do not believe him. To persuade his customers that he is right, the baker decides to do a 
hypothesis test. He bakes 10 loaves of bread. The mean height of the sample loaves is 17 cm. The 
baker knows from baking hundreds of loaves of bread that the standard deviation for the height 
is 0.5 cm. and the distribution of heights is normal. 

The null hypothesis could be H : \i < 15 The alternate hypothesis is H a : ji > 15 

The words "is more than" translates as a "> " so "\i > 15" goes into the alternate hypothesis. The 
null hypothesis must contradict the alternate hypothesis. 



Since a is known (a — 0.5 cm.), the distribution for the population is known to be normal with 

q_ _ _05_ 



mean ]i— 15 and standard deviation -£= = -R= = 0.16. 



Suppose the null hypothesis is true (the mean height of the loaves is no more than 15 cm). Then 
is the mean height (17 cm) calculated from the sample unexpectedly large? The hypothesis test 
works by asking the question how unlikely the sample mean would be if the null hypothesis 
were true. The graph shows how far out the sample mean is on the normal curve. The p-value is 
the probability that, if we were to take other samples, any other sample mean would fall at least 
as far out as 17 cm. 

The p-value, then, is the probability that a sample mean is the same or greater than 17 cm. 
when the population mean is, in fact, 15 cm. We can calculate this probability using the normal 
distribution for means from Chapter 7. 

p-value is 
approximately 




17 

p-value = P (x > 17) which is approximately 0. 

7 This content is available online at <http://cnx.Org/content/ml6995/l.17/>. 
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A p-value of approximately tells us that it is highly unlikely that a loaf of bread rises no more 
than 15 cm, on the average. That is, almost 0% of all loaves of bread would be at least as high 
as 17 cm. purely by CHANCE had the population mean height really been 15 cm. Because the 
outcome of 17 cm. is so unlikely (meaning it is happening NOT by chance alone), we conclude 
that the evidence is strongly against the null hypothesis (the mean height is at most 15 cm.). There 
is sufficient evidence that the true mean height for the population of the baker's loaves of bread is 
greater than 15 cm. 



9.8 Decision and Conclusion 8 

A systematic way to make a decision of whether to reject or not reject the null hypothesis is to compare the 
p-value and a preset or preconceived a (also called a "significance level"). A preset a is the probability of 
a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given 
to you at the beginning of the problem. 

When you make a decision to reject or not reject H , do as follows: 

• If a > p-value, reject H . The results of the sample data are significant. There is sufficient evidence to 
conclude that H is an incorrect belief and that the alternative hypothesis, H a , may be correct. 

• If oc < p-value, do not reject H . The results of the sample data are not significant. There is not 
sufficient evidence to conclude that the alternative hypothesis, H a , may be correct. 

• When you "do not reject H ", it does not mean that you should believe that H is true. It simply 
means that the sample data have failed to provide sufficient evidence to cast serious doubt about the 
truthfulness of H . 

Conclusion: After you make your decision, write a thoughtful conclusion about the hypotheses in terms 
of the given problem. 

9.9 Additional Information 9 

• In a hypothesis test problem, you may see words such as "the level of significance is 1%." The "1%" is 
the preconceived or preset a. 

• The statistician setting up the hypothesis test selects the value of a to use before collecting the sample 
data. 

• If no level of significance is given, the accepted standard is to use a = 0.05. 

• When you calculate the p-value and draw the picture, the p-value is the area in the left tail, the right 
tail, or split evenly between the two tails. For this reason, we call the hypothesis test left, right, or two 
tailed. 

• The alternate hypothesis, H a , tells you if the test is left, right, or two-tailed. It is the key to conducting 
the appropriate test. 

• H a never has a symbol that contains an equal sign. 

• Thinking about the meaning of the p-value: A data analyst (and anyone else) should have more 
confidence that he made the correct decision to reject the null hypothesis with a smaller p-value (for 
example, 0.001 as opposed to 0.04) even if using the 0.05 level for alpha. Similarly, for a large p-value 
like 0.4, as opposed to a p-value of 0.056 (alpha = 0.05 is less than either number), a data analyst should 
have more confidence that she made the correct decision in failing to reject the null hypothesis. This 
makes the data analyst use judgment rather than mindlessly applying rules. 

8 This content is available online at <http://cnx.Org/content/ml6992/l.ll/>. 
9 This content is available online at <http://cnx.Org/content/ml6999/l.13/>. 
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The following examples illustrate a left, right, and two-tailed test. 

Example 9.8 

H : ]i = 5 H a : fi < 5 

Test of a single population mean. H a tells you the test is left-tailed. The picture of the p-value is as 
follows: 



p-value 




Example 9.9 

H : p < 0.2 H a : p > 0.2 

This is a test of a single population proportion. H a tells you the test is right-tailed. The picture of 
the p-value is as follows: 



p-value 




Example 9.10 

H : fi = 50 H a \]i^ 50 

This is a test of a single population mean. H a tells you the test is two-tailed. The picture of the 
p-value is as follows. 



— (p-value) 



— (p-value) 



50 



x 
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9.10 Summary of the Hypothesis Test 10 

The hypothesis test itself has an established process. This can be summarized as follows: 

1 . Determine H and H a . Remember, they are contradictory. 

2. Determine the random variable. 

3. Determine the distribution for the test. 

4. Draw a graph, calculate the test statistic, and use the test statistic to calculate the p-value. (A z-score 
and a t-score are examples of test statistics.) 

5. Compare the preconceived a with the p-value, make a decision (reject or do not reject H ), and write 
a clear conclusion using English sentences. 

Notice that in performing the hypothesis test, you use a and not /3. /3 is needed to help determine the 
sample size of the data that is used in calculating the p-value. Remember that the quantity 1 — /3 is called 
the Power of the Test. A high power is desirable. If the power is too low, statisticians typically increase the 
sample size while keeping a the same. If the power is low, the null hypothesis might not be rejected when 
it should be. 



9.11 Examples 11 



Example 9.11 

Jeffrey, as an eight-year old, established a mean time of 16.43 seconds for swimming the 25-yard 
freestyle, with a standard deviation of 0.8 seconds. His dad, Frank, thought that Jeffrey could 
swim the 25-yard freestyle faster by using goggles. Frank bought Jeffrey a new pair of expensive 
goggles and timed Jeffrey for 15 25-yard freestyle swims. For the 15 swims, Jeffrey's mean time 
was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 
seconds. Conduct a hypothesis test using a preset a = 0.05. Assume that the swim times for the 
25-yard freestyle are normal. 

Solution 

Set up the Hypothesis Test: 

Since the problem is about a mean, this is a test of a single population mean. 

H : ji = 16.43 H a \]i< 16.43 

For Jeffrey to swim faster, his time will be less than 16.43 seconds. The "<" tells you this is left- 
tailed. 

Determine the distribution needed: 

Random variable: X = the mean time to swim the 25-yard freestyle. 

Distribution for the test: X is normal (population standard deviation is known: a = 0.8) 

X~N (f/770 Therefore, X - N ( 16.43, M^ 

\i — 16.43 comes from Hq and not the data, a = 0.8, and n = 15. 

Calculate the p-value using the normal distribution for a mean: 



10 This content is available online at <http://cnx.Org/content/ml6993/l.6/>. 
n This content is available online at <http://cnx.Org/content/ml7005/l.25/>. 
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p-value = P I X< 16 J = 0.0187 where the sample mean in the problem is given as 16. 

p-value = 0.0187 (This is called the actual level of significance.) The p-value is the area to the left 
of the sample mean is given as 16. 

Graph: 

p-value 

x= 16 

p. = 16.43 




16 16.43 



Figure 9.1 



}i — 16 A3 comes from H . Our assumption is ji = 16.43. 



Interpretation of the p-value: If H is true, there is a 0.0187 probability (1.87%) that Jeffrey's mean 
time to swim the 25-yard freestyle is 16 seconds or less. Because a 1.87% chance is small, the mean 
time of 16 seconds or less is unlikely to have happened randomly. It is a rare event. 

Compare a and the p-value: 

a = 0.05 p-value = 0.0187 a > p-value 

Make a decision: Since a. > p-value, reject H . 

This means that you reject \i = 16.43. In other words, you do not think Jeffrey swims the 25-yard 
freestyle in 16.43 seconds but faster with the new goggles. 

Conclusion: At the 5% significance level, we conclude that Jeffrey swims faster using the new 
goggles. The sample data show there is sufficient evidence that Jeffrey's mean time to swim the 
25-yard freestyle is less than 16.43 seconds. 

The p-value can easily be calculated using the TI-83+ and the TT84 calculators: 

Press STAT and arrow over to TESTS. Press 1 : Z-Test. Arrow over to Stats and press ENTER. Arrow 
down and enter 16.43 for jiq (null hypothesis), .8 for a, 16 for the sample mean, and 15 for n. Arrow 
down to \v. (alternate hypothesis) and arrow over to <}Iq- Press ENTER. Arrow down to Calculate 
and press ENTER. The calculator not only calculates the p-value (p = 0.0187) but it also calculates 
the test statistic (z-score) for the sample mean. \i < 16.43 is the alternate hypothesis. Do this set 
of instructions again except arrow to Draw (instead of Calculate). Press ENTER. A shaded graph 
appears with z = —2.08 (test statistic) and p — 0.0187 (p-value). Make sure when you use Draw 
that no other equations are highlighted in Y = and the plots are turned off. 
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When the calculator does a Z-Test, the Z-Test function finds the p-value by doing a normal prob- 
ability calculation using the Central Limit Theorem: 



P(x < 16) = 2nd DISTR normcdf (-10 A 99, 16, 16.43,0.8/ V15 

The Type I and Type II errors for this problem are as follows: 

The Type I error is to conclude that Jeffrey swims the 25-yard freestyle, on average, in less than 
16.43 seconds when, in fact, he actually swims the 25-yard freestyle, on average, in 16.43 seconds. 
(Reject the null hypothesis when the null hypothesis is true.) 

The Type II error is that there is not evidence to conclude that Jeffrey swims the 25-yard free-style, 
on average, in less than 16.43 seconds when, in fact, he actually does swim the 25-yard free-style, 
on average, in less than 16.43 seconds. (Do not reject the null hypothesis when the null hypothesis 
is false.) 



Historical Note: The traditional way to compare the two probabilities, cc and the p-value, is to compare 
the critical value (z-score from cc) to the test statistic (z-score from data). The calculated test statistic for the 
p-value is —2.08. (From the Central Limit Theorem, the test statistic formula is z = j^x ■ For this problem, 

x — 16, fix — 16.43 from the null hypothesis, <Xx = 0.8, and n = 15.) You can find the critical value for 
cc = 0.05 in the normal table (see 15.Tables in the Table of Contents). The z-score for an area to the left 
equal to 0.05 is midway between -1.65 and -1.64 (0.05 is midway between 0.0505 and 0.0495). The z-score is 
-1.645. Since —1.645 > — 2.08 (which demonstrates that cc > p-value), reject H . Traditionally, the decision 
to reject or not reject was done in this way. Today, comparing the two probabilities cc and the p-value is very 
common. For this problem, the p-value, 0.0187 is considerably smaller than cc, 0.05. You can be confident 
about your decision to reject. The graph shows cc, the p-value, and the test statistics and the critical value. 



a = 0.05 




p-value == 0.0187 

-2.08 -1,645 

Figure 9.2 



Example 9.12 

A college football coach thought that his players could bench press a mean weight of 275 pounds. 
It is known that the standard deviation is 55 pounds. Three of his players thought that the mean 
weight was more than that amount. They asked 30 of their teammates for their estimated maxi- 
mum lift on the bench press exercise. The data ranged from 205 pounds to 385 pounds. The actual 
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different weights were (frequencies are in parentheses) 205(3); 215(3); 225(1); 241(2); 252(2); 265(2); 
275(2); 313(2); 316(5); 338(2); 341(1); 345(2); 368(2); 385(1). (Source: data from Reuben Davis, Kraig 
Evans, and Scott Gunderson.) 

Conduct a hypothesis test using a 2.5% level of significance to determine if the bench press mean 
is more than 275 pounds. 

Solution 

Set up the Hypothesis Test: 

Since the problem is about a mean weight, this is a test of a single population mean. 

H : pi = 275 H a : fi> 275 This is a right-tailed test. 

Calculating the distribution needed: 

Random variable: X = the mean weight, in pounds, lifted by the football players. 

Distribution for the test: It is normal because <r is known. 

X~N (275,-%;) 
V V30J 

x — 286.2 pounds (from the data). 

cr = 55 pounds (Always use a if you know it.) We assume }i = 275 pounds unless our data shows 
us otherwise. 

Calculate the p-value using the normal distribution for a mean and using the sample mean as 
input (see the calculator instructions below for using the data as input): 

p-value = P ( x > 286.2) = 0.1323. 

Interpretation of the p-value: If H is true, then there is a 0.1331 probability (13.23%) that the 
football players can lift a mean weight of 286.2 pounds or more. Because a 13.23% chance is large 
enough, a mean weight lift of 286.2 pounds or more is not a rare event. 



x = 286.2 p-value = 0.1323 



275 286.2 



Figure 9.3 



Compare a and the p-value: 
a = 0.025 p-value = 0.1323 
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Make a decision: Since a<p-value, do not reject H . 

Conclusion: At the 2.5% level of significance, from the sample data, there is not sufficient evidence 
to conclude that the true mean weight lifted is more than 275 pounds. 

The p-value can easily be calculated using the TI-83+ and the TI-84 calculators: 

Put the data and frequencies into lists. Press STAT and arrow over to TESTS. Press 1 : Z-Test. Arrow 
over to Data and press ENTER. Arrow down and enter 275 for jaq, 55 for c, the name of the list where 
you put the data, and the name of the list where you put the frequencies. Arrow down to }i : and 
arrow over to > }Iq. Press ENTER. Arrow down to Calculate and press ENTER. The calculator not 
only calculates the p-value (p = 0.1331, a little different from the above calculation - in it we 
used the sample mean rounded to one decimal place instead of the data) but it also calculates the 
test statistic (z-score) for the sample mean, the sample mean, and the sample standard deviation. 
\i > 275 is the alternate hypothesis. Do this set of instructions again except arrow to Draw (instead 
of Calculate). Press ENTER. A shaded graph appears with z = 1.112 (test statistic) and p = 0.1331 
(p-value). Make sure when you use Draw that no other equations are highlighted in Y = and the 
plots are turned off. 



Example 9.13 

Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor 
thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 
65; 65; 70; 67; 66; 63; 63; 68; 72; 71. He performs a hypothesis test using a 5% level of significance. 
The data are from a normal distribution. 

Solution 

Set up the Hypothesis Test: 

A 5% level of significance means that a = 0.05. This is a test of a single population mean. 

H : ]i — 65 H a : ]i > 65 

Since the instructor thinks the average score is higher, use a "> ". The "> " means the test is 
right-tailed. 

Determine the distribution needed: 

Random variable: X = average score on the first statistics test. 

Distribution for the test: If you read the problem carefully, you will notice that there is no pop- 
ulation standard deviation given. You are only given n — 10 sample data values. Notice also 
that the data come from a normal distribution. This means that the distribution for the test is a 
student's-t. 

Use f^f- Therefore, the distribution for the test is £9 where n — 10 and df — 10 — 1 = 9. 

Calculate the p-value using the Student's-t distribution: 

p-value = P ( x > 67 )= 0.0396 where the sample mean and sample standard deviation are 
calculated as 67 and 3.1972 from the data. 

Interpretation of the p-value: If the null hypothesis is true, then there is a 0.0396 probability 
(3.96%) that the sample mean is 67 or more. 
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p-value = 0.0396 




x= 67 
H = 65 



65 67 



Figure 9.4 

Compare a and the p-value: 

Since a. = .05 and p-value = 0.0396. Therefore, a > p-value. 

Make a decision: Since a. > p-value, reject H . 

This means you reject p. = 65. In other words, you believe the average test score is more than 65. 

Conclusion: At a 5% level of significance, the sample data show sufficient evidence that the mean 
(average) test score is more than 65, just as the math instructor thinks. 

The p-value can easily be calculated using the TI-83+ and the TI-84 calculators: 

Put the data into a list. Press STAT and arrow over to TESTS. Press 2:T-Test. Arrow over to 
Data and press ENTER. Arrow down and enter 65 for jIq, the name of the list where you put the 
data, and 1 for Freq:. Arrow down to p, : and arrow over to > po- Press ENTER. Arrow down 
to Calculate and press ENTER. The calculator not only calculates the p-value (p = 0.0396) but it 
also calculates the test statistic (t-score) for the sample mean, the sample mean, and the sample 
standard deviation, ^i > 65 is the alternate hypothesis. Do this set of instructions again except 
arrow to Draw (instead of Calculate). Press ENTER. A shaded graph appears with t = 1.9781 (test 
statistic) and p = 0.0396 (p-value). Make sure when you use Draw that no other equations are 
highlighted in Y = and the plots are turned off. 



Example 9.14 

Joon believes that 50% of first- time brides in the United States are younger than their grooms. 
She performs a hypothesis test to determine if the percentage is the same or different from 50%. 
Joon samples 100 first-time brides and 53 reply that they are younger than their grooms. For the 
hypothesis test, she uses a 1% level of significance. 

Solution 

Set up the Hypothesis Test: 

The 1% level of significance means that a — 0.01. This is a test of a single population proportion. 

H : p = 0.50 H a : p / 0.50 

The words "is the same or different from" tell you this is a two-tailed test. 
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Calculate the distribution needed: 

Random variable: P' = the percent of of first-time brides who are younger than their grooms. 

Distribution for the test: The problem contains no mention of a mean. The information is given 
in terms of percentages. Use the distribution for P' , the estimated proportion. 



P' ~ N ( P, yir ) Therefore, F~N 0.5, ^^^ ) where p = 0.50, q = 1 - p = 0.50, and 

n = 100. 

Calculate the p-value using the normal distribution for proportions: 

p-value = P(p'< 0.47 or p' > 0.53 ) = 0.5485 

where x = 53, p' = § = ^ = 0.53. 

Interpretation of the p-value: If the null hypothesis is true, there is 0.5485 probability (54.85%) 
that the sample (estimated) proportion p' is 0.53 or more OR 0.47 or less (see the graph below). 

-(p-value) = 0.27425 -(p-value) = 0.27425 
2 \ 2 



P 




-i- 
0.47 0.50 0.53 



Figure 9.5 



fi = p = 0.50 comes from H , the null hypothesis. 

p'= 0.53. Since the curve is symmetrical and the test is two-tailed, the p' for the left tail is equal to 
0.50 - 0.03 = 0.47 where \i = p = 0.50. (0.03 is the difference between 0.53 and 0.50.) 

Compare a and the p-value: 

Since a. = 0.01 and p-value = 0.5485. Therefore, oc< p-value. 

Make a decision: Since a<p-value, you cannot reject H . 

Conclusion: At the 1% level of significance, the sample data do not show sufficient evidence that 
the percentage of first-time brides that are younger than their grooms is different from 50%. 

The p-value can easily be calculated using the TI-83+ and the TI-84 calculators: 

Press STAT and arrow over to TESTS. Press 5: 1-PropZTest. Enter .5 for p , 53 for x and 100 for 
n. Arrow down to Prop and arrow to not equals pp- Press ENTER. Arrow down to Calculate 
and press ENTER. The calculator calculates the p-value (p — 0.5485) and the test statistic (z-score). 
Prop not equals .5 is the alternate hypothesis. Do this set of instructions again except arrow to 
Draw (instead of Calculate). Press ENTER. A shaded graph appears with z = 0.6 (test statistic) and 
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p = 0.5485 (p-value). Make sure when you use Draw that no other equations are highlighted in 
Y = and the plots are turned off. 

The Type I and Type II errors are as follows: 

The Type I error is to conclude that the proportion of first-time brides that are younger than their 
grooms is different from 50% when, in fact, the proportion is actually 50%. (Reject the null hy- 
pothesis when the null hypothesis is true). 

The Type II error is there is not enough evidence to conclude that the proportion of first time brides 
that are younger than their grooms differs from 50% when, in fact, the proportion does differ from 
50%. (Do not reject the null hypothesis when the null hypothesis is false.) 



Example 9.15 
Problem 1 

Suppose a consumer group suspects that the proportion of households that have three cell phones 
is 30%. A cell phone company has reason to believe that the proportion is 30%. Before they start 
a big advertising campaign, they conduct a hypothesis test. Their marketing people survey 150 
households with the result that 43 of the households have three cell phones. 

Solution 

Set up the Hypothesis Test: 

H : p = 0.30 H a : p^ 0.30 

Determine the distribution needed: 

The random variable is P' = proportion of households that have three cell phones. 



The distribution for the hypothesis test is P' ~ N ( 0.30, y - ' 15 q ' — - 

Problem 2 

The value that helps determine the p-value is p' . Calculate p' . 

Problem 3 

What is a success for this problem? 

Problem 4 

What is the level of significance? 

Draw the graph for this problem. Draw the horizontal axis. Label and shade appropriately. 

Problem 5 

Calculate the p-value. 

Problem 6 

Make a decision. (Reject/Do not reject) Hg because . 



The next example is a poem written by a statistics student named Nicole Hart. The solution to the problem 
follows the poem. Notice that the hypothesis test is for a single population proportion. This means that the 
null and alternate hypotheses use the parameter p. The distribution for the test is normal. The estimated 
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proportion p' is the proportion of fleas killed to the total fleas found on Fido. This is sample information. 
The problem gives a preconceived a. = 0.01, for comparison, and a 95% confidence interval computation. 
The poem is clever and humorous, so please enjoy it! 

NOTE: Hypothesis testing problems consist of multiple steps. To help you do the problems, so- 
lution sheets are provided for your use. Look in the Table of Contents Appendix for the topic 
"Solution Sheets." If you like, use copies of the appropriate solution sheet for homework prob- 
lems. 

Example 9.16 

My dog has so many fleas, 
They do not come off with ease . 
As for shampoo, I have tried many types 
Even one called Bubble Hype , 
Which only killed 25% of the fleas, 
Unfortunately I was not pleased. 

I've used all kinds of soap, 
Until I had give up hope 
Until one day I saw 
An ad that put me in awe . 

A shampoo used for dogs 

Called GOOD ENOUGH to Clean a Hog 

Guaranteed to kill more fleas. 

I gave Fido a bath 
And after doing the math 
His number of fleas 
Started dropping by 3's! 

Before his shampoo 

I counted 42. 

At the end of his bath, 

I redid the math 

And the new shampoo had killed 17 fleas. 

So now I was pleased. 

Now it is time for you to have some fun 
With the level of significance being .01, 
You must help me figure out 
Use the new shampoo or go without? 

Solution 

Set up the Hypothesis Test: 

H : p = 0.25 H a : p > 0.25 

Determine the distribution needed: 

In words, CLEARLY state what your random variable X or P' represents. 
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P' = The proportion of fleas that are killed by the new shampoo 
State the distribution to use for the test. 



Normal: N 0.25, 



(0.25)(l-0.25) 
42 



Test Statistic: z = 2.3163 

Calculate the p-value using the normal distribution for proportions: 

p-value =0.0103 

In 1 - 2 complete sentences, explain what the p-value means for this problem. 

If the null hypothesis is true (the proportion is 0.25), then there is a 0.0103 probability that the 

17 



sample (estimated) proportion is 0.4048 ( 52 J or more. 

Use the previous information to sketch a picture of this situation. CLEARLY, label and scale the 
horizontal axis and shade the region(s) corresponding to the p-value. 




25 17 42= Test statistic foi 
bAMb 17 42: 2.3163 



Figure 9.6 



Compare a and the p-value: 

Indicate the correct decision ("reject" or "do not reject" the null hypothesis), the reason for it, and 
write an appropriate conclusion, using COMPLETE SENTENCES. 



alpha 


decision 


reason for decision 


0.01 


Do not reject H 


a<p-value 



Table 9.3 

Conclusion: At the 1% level of significance, the sample data do not show sufficient evidence that 
the percentage of fleas that are killed by the new shampoo is more than 25%. 

Construct a 95% Confidence Interval for the true mean or proportion. Include a sketch of the 
graph of the situation. Label the point estimate and the lower and upper bounds of the Confidence 
Interval. 
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0.26 17/42 0.55 



Figure 9.7 



Confidence Interval: (0.26,0.55) We are 95% confident that the true population proportion p of 
fleas that are killed by the new shampoo is between 26% and 55%. 

NOTE: This test result is not very definitive since the p-value is very close to alpha. In reality, one 
would probably do more tests by giving the dog another bath after the fleas have had a chance to 
return. 
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9.12 Summary of Formulas 

H and H a are contradictory. 



12 



If H has: 


equal (=) 


greater than or equal to 

(>) 


less than or equal to 

(<) 


then H a has: 


not equal ( ^ ) or greater 
than (> ) or less than 

(<) 


less than ( < ) 


greater than ( > ) 



Table 9.4 

If a < p-value, then do not reject H . 

If a > p-value, then reject H . 

a is preconceived. Its value is set before the hypothesis test starts. The p-value is calculated from the data. 

a = probability of a Type I error = P(Type I error) = probability of rejecting the null hypothesis when the 
null hypothesis is true. 

/5 = probability of a Type II error = P(Type II error) = probability of not rejecting the null hypothesis when 
the null hypothesis is false. 

If there is no given preconceived a, then use a. = 0.05. 
Types of Hypothesis Tests 

• Single population mean, known population variance (or standard deviation): Normal test. 

• Single population mean, unknown population variance (or standard deviation): Student's-t test. 

• Single population proportion: Normal test. 



2 This content is available online at <http://cnx.org/content/ml6996/1.9/>. 
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9.13 Practice 1: Single Mean, Known Population Standard Deviation 13 

9.13.1 Student Learning Outcomes 

• The student will conduct a hypothesis test of a single mean with known population standard devia- 
tion. 



9.13.2 Given 

Suppose that a recent article stated that the mean time spent in jail by a first-time convicted burglar is 2.5 
years. A study was then done to see if the mean time has increased in the new century. A random sample 
of 26 first-time convicted burglars in a recent year was picked. The mean length of time in jail from the 
survey was 3 years with a standard deviation of 1.8 years. Suppose that it is somehow known that the 
population standard deviation is 1.5. Conduct a hypothesis test to determine if the mean length of jail time 
has increased. The distribution of the population is normal. 

9.13.3 Hypothesis Testing: Single Mean 

Exercise 9.13.1 (Solution on p. 407.) 

Is this a test of means or proportions? 

Exercise 9.13.2 (Solution on p. 407.) 

State the null and alternative hypotheses. 

a. H : 

b. H a : 

Exercise 9.13.3 (Solution on p. 407.) 

Is this a right-tailed, left-tailed, or two-tailed test? How do you know? 

Exercise 9.13.4 (Solution on p. 407.) 

What symbol represents the Random Variable for this test? 

Exercise 9.13.5 (Solution on p. 407.) 

In words, define the Random Variable for this test. 

Exercise 9.13.6 (Solution on p. 407.) 

Is the population standard deviation known and, if so, what is it? 

Exercise 9.13.7 (Solution on p. 407.) 

Calculate the following: 

a. x = 

b. a = 

c. s x = 
A. n = 

Exercise 9.13.8 (Solution on p. 407.) 

Since both cr and s x are given, which should be used? In 1 -2 complete sentences, explain why. 

Exercise 9.13.9 (Solution on p. 407.) 

State the distribution to use for the hypothesis test. 

Exercise 9.13.10 

Sketch a graph of the situation. Label the horizontal axis. Mark the hypothesized mean and the 
sample mean x. Shade the area corresponding to the p-value. 



3 This content is available online at <http://cnx.Org/content/ml7004/l.ll/>. 
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Exercise 9.13.11 (Solution on p. 407.) 

Find the p-value. 

Exercise 9.13.12 (Solution on p. 407.) 

At a pre-conceived a = 0.05, what is your: 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 



9.13.4 Discussion Questions 

Exercise 9.13.13 

Does it appear that the mean jail time spent for first time convicted burglars has increased? Why 
or why not? 
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9.14 Practice 2: Single Mean, Unknown Population Standard Deviation 
9.14.1 Student Learning Outcomes 

• The student will conduct a hypothesis test of a single mean with unknown population standard de- 
viation. 



9.14.2 Given 

A random survey of 75 death row inmates revealed that the mean length of time on death row is 17.4 years 
with a standard deviation of 6.3 years. Conduct a hypothesis test to determine if the population mean time 
on death row could likely be 15 years. 



(Solution on p. 407.) 
(Solution on p. 407.) 



9.14.3 Hypothesis Testing: Single Mean 

Exercise 9.14.1 

Is this a test of means or proportions? 

Exercise 9.14.2 

State the null and alternative hypotheses. 

a. H : 

b. H a : 

Exercise 9.14.3 

Is this a right-tailed, left-tailed, or two-tailed test? How do you know? 

Exercise 9.14.4 

What symbol represents the Random Variable for this test? 

Exercise 9.14.5 

In words, define the Random Variable for this test. 

Exercise 9.14.6 

Is the population standard deviation known and, if so, what is it? 

Exercise 9.14.7 

Calculate the following: 

a. x = 

b. 6.3 = 

c. n — 



Exercise 9.14.8 

Which test should be used? In 1 -2 complete sentences, explain why. 

Exercise 9.14.9 

State the distribution to use for the hypothesis test. 

Exercise 9.14.10 

Sketch a graph of the situation. Label the horizontal axis. Mark the hypothesized mean and the 
sample mean, x. Shade the area corresponding to the p-value. 



(Solution on p. 407.) 
(Solution on p. 407.) 
(Solution on p. 407.) 
(Solution on p. 407.) 
(Solution on p. 407.) 



(Solution on p. 408.) 
(Solution on p. 408.) 



4 This content is available online at <http://cnx.Org/content/ml7016/l.12/>. 
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Figure 9.8 



Exercise 9.14.11 

Find the p-value. 

Exercise 9.14.12 

At a pre-conceived a = 0.05, what is your: 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 



(Solution on p. 408.) 



(Solution on p. 408.) 



9.14.4 Discussion Question 

Does it appear that the mean time on death row could be 15 years? Why or why not? 
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9.15 Practice 3: Single Proportion 15 
9.15.1 Student Learning Outcomes 

• The student will conduct a hypothesis test of a single population proportion. 



9.15.2 Given 

The National Institute of Mental Health published an article stating that in any one-year pe- 
riod, approximately 9.5 percent of American adults suffer from depression or a depressive illness. 
(http://www.nimh.nih.gov/publicat/depression.cfm) Suppose that in a survey of 100 people in a certain 
town, seven of them suffered from depression or a depressive illness. Conduct a hypothesis test to deter- 
mine if the true proportion of people in that town suffering from depression or a depressive illness is lower 
than the percent in the general adult American population. 



(Solution on p. 408.) 
(Solution on p. 408.) 



9.15.3 Hypothesis Testing: Single Proportion 

Exercise 9.15.1 

Is this a test of means or proportions? 

Exercise 9.15.2 

State the null and alternative hypotheses. 

a. H : 

b. H a : 

Exercise 9.15.3 

Is this a right-tailed, left-tailed, or two-tailed test? How do you know? 

Exercise 9.15.4 

What symbol represents the Random Variable for this test? 

Exercise 9.15.5 

In words, define the Random Variable for this test. 

Exercise 9.15.6 

Calculate the following: 

a: x = 
b: n = 
c:p' = 

Exercise 9.15.7 

Calculate <J„>. Make sure to show how you set up the formula. 

Exercise 9.15.8 

State the distribution to use for the hypothesis test. 

Exercise 9.15.9 

Sketch a graph of the situation. Label the horizontal axis. Mark the hypothesized mean and the 
sample proportion, p-hat. Shade the area corresponding to the p-value. 



(Solution on p. 408.) 
(Solution on p. 408.) 
(Solution on p. 408.) 
(Solution on p. 408.) 



(Solution on p. 408.) 
(Solution on p. 408.) 



5 This content is available online at <http://cnx.Org/content/ml7003/l.15/>. 
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Exercise 9.15.10 (Solution on p. 408.) 

Find the p-value 

Exercise 9.15.11 (Solution on p. 408.) 

At a pre-conceived a = 0.05, what is your: 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 



9.15.4 Discusion Question 

Exercise 9.15.12 

Does it appear that the proportion of people in that town with depression or a depressive illness 
is lower than general adult American population? Why or why not? 
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9.16 Homework 16 

Exercise 9.16.1 (Solution on p. 408.) 

Some of the statements below refer to the null hypothesis, some to the alternate hypothesis. 

State the null hypothesis, H , and the alternative hypothesis, H a , in terms of the appropriate pa- 
rameter {}i or p). 

a. The mean number of years Americans work before retiring is 34. 

b. At most 60% of Americans vote in presidential elections. 

c. The mean starting salary for San Jose State University graduates is at least $100,000 per year. 

d. 29% of high school seniors get drunk each month. 

e. Fewer than 5% of adults ride the bus to work in Los Angeles. 

f. The mean number of cars a person owns in her lifetime is not more than 10. 

g. About half of Americans prefer to live away from cities, given the choice. 
h. Europeans have a mean paid vacation each year of six weeks. 

i. The chance of developing breast cancer is under 11% for women. 
j. Private universities mean tuition cost is more than $20,000 per year. 

Exercise 9.16.2 (Solution on p. 409.) 

For (a) - (j) above, state the Type I and Type II errors in complete sentences. 

Exercise 9.16.3 

For (a) - (j) above, in complete sentences: 



a. State a consequence of committing a Type I error. 

b. State a consequence of committing a Type II error. 



DIRECTIONS: For each of the word problems, use a solution sheet to do the hypothesis test. The 
solution sheet is found in 14. Appendix (online book version: the link is "Solution Sheets"; PDF 
book version: look under 14.5 Solution Sheets). Please feel free to make copies of the solution 
sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files. 

NOTE: If you are using a student's-t distribution for a homework problem below, you may assume 
that the underlying population is normally distributed. (In general, you must first prove that 
assumption, though.) 

Exercise 9.16.4 

A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it needs 
to be replaced. From past studies of this tire, the standard deviation is known to be 8000. A survey 
of owners of that tire design is conducted. From the 28 tires surveyed, the mean lifespan was 
46,500 miles with a standard deviation of 9800 miles. Do the data support the claim at the 5% 
level? 

Exercise 9.16.5 (Solution on p. 409.) 

From generation to generation, the mean age when smokers first start to smoke varies. However, 
the standard deviation of that age remains constant of around 2.1 years. A survey of 40 smokers 
of this generation was done to see if the mean starting age is at least 19. The sample mean was 
18.1 with a sample standard deviation of 1.3. Do the data support the claim at the 5% level? 



6 This content is available online at <http://cnx.Org/content/ml7001/l.14/>. 
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Exercise 9.16.6 

The cost of a daily newspaper varies from city to city. However, the variation among prices 
remains steady with a standard deviation of 20(£. A study was done to test the claim that the mean 
cost of a daily newspaper is $1.00. Twelve costs yield a mean cost of 95 (£ with a standard deviation 
of 18(t. Do the data support the claim at the 1% level? 

Exercise 9.16.7 (Solution on p. 409.) 

An article in the San Jose Mercury News stated that students in the California state university 
system take 4.5 years, on average, to finish their undergraduate degrees. Suppose you believe that 
the mean time is longer. You conduct a survey of 49 students and obtain a sample mean of 5.1 with 
a sample standard deviation of 1.2. Do the data support your claim at the 1% level? 

Exercise 9.16.8 

The mean number of sick days an employee takes per year is believed to be about 10. Members 
of a personnel department do not believe this figure. They randomly survey 8 employees. The 
number of sick days they took for the past year are as follows: 12; 4; 15; 3; 11; 8; 6; 8. Let x = the 
number of sick days they took for the past year. Should the personnel team believe that the mean 
number is about 10? 

Exercise 9.16.9 (Solution on p. 409.) 

In 1955, Life Magazine reported that the 25 year-old mother of three worked, on average, an 80 
hour week. Recently, many groups have been studying whether or not the women's movement 
has, in fact, resulted in an increase in the average work week for women (combining employment 
and at-home work). Suppose a study was done to determine if the mean work week has increased. 
81 women were surveyed with the following results. The sample mean was 83; the sample stan- 
dard deviation was 10. Does it appear that the mean work week has increased for women at the 
5% level? 

Exercise 9.16.10 

Your statistics instructor claims that 60 percent of the students who take her Elementary Statistics 
class go through life feeling more enriched. For some reason that she can't quite figure out, most 
people don't believe her. You decide to check this out on your own. You randomly survey 64 of 
her past Elementary Statistics students and find that 34 feel more enriched as a result of her class. 
Now, what do you think? 

Exercise 9.16.11 (Solution on p. 409.) 

A Nissan Motor Corporation advertisement read, "The average man's I.Q. is 107. The average 
brown trout's I.Q. is 4. So why can't man catch brown trout?" Suppose you believe that the brown 
trout's mean I.Q. is greater than 4. You catch 12 brown trout. A fish psychologist determines the 
I.Q.s as follows: 5; 4; 7; 3; 6; 4; 5; 3; 6; 3; 8; 5. Conduct a hypothesis test of your belief. 

Exercise 9.16.12 

Refer to the previous problem. Conduct a hypothesis test to see if your decision and conclusion 
would change if your belief were that the brown trout's mean I.Q. is not 4. 

Exercise 9.16.13 (Solution on p. 409.) 

According to an article in Newsweek, the natural ratio of girls to boys is 100:105. In China, the 
birth ratio is 100: 114 (46.7% girls). Suppose you don't believe the reported figures of the percent 
of girls born in China. You conduct a study. In this study, you count the number of girls and boys 
born in 150 randomly chosen recent births. There are 60 girls and 90 boys born of the 150. Based 
on your study, do you believe that the percent of girls born in China is 46.7? 

Exercise 9.16.14 

A poll done for Newsweek found that 13% of Americans have seen or sensed the presence of an 
angel. A contingent doubts that the percent is really that high. It conducts its own survey. Out 
of 76 Americans surveyed, only 2 had seen or sensed the presence of an angel. As a result of the 
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contingent's survey, would you agree with the Newsweek poll? In complete sentences, also give 
three reasons why the two polls might give different results. 

Exercise 9.16.15 (Solution on p. 409.) 

The mean work week for engineers in a start-up company is believed to be about 60 hours. A 
newly hired engineer hopes that it's shorter. She asks 10 engineering friends in start-ups for the 
lengths of their mean work weeks. Based on the results that follow, should she count on the mean 
work week to be shorter than 60 hours? 

Data (length of mean work week): 70; 45; 55; 60; 65; 55; 55; 60; 50; 55. 

Exercise 9.16.16 

Use the "Lap time" data for Lap 4 (see Table of Contents) to test the claim that Terri finishes Lap 
4, on average, in less than 129 seconds. Use all twenty races given. 

Exercise 9.16.17 

Use the "Initial Public Offering" data (see Table of Contents) to test the claim that the mean offer 
price was $18 per share. Do not use all the data. Use your random number generator to randomly 
survey 15 prices. 



NOTE: The following questions were written by past students. They are excellent problems! 

Exercise 9.16.18 

18. "Asian Family Reunion" by Chau Nguyen 

Every two years it comes around 
We all get together from different towns. 
In my honest opinion 
It's not a typical family reunion 
Not forty, or fifty, or sixty, 
But how about seventy companions ! 
The kids would play, scream, and shout 
One minute they're happy, another they'll pout. 
The teenagers would look, stare, and compare 
From how they look to what they wear. 
The men would chat about their business 
That they make more, but never less. 
Money is always their subject 
And there's always talk of more new projects. 
The women get tired from all of the chats 
They head to the kitchen to set out the mats. 
Some would sit and some would stand 
Eating and talking with plates in their hands. 
Then come the games and the songs 
And suddenly, everyone gets along! 
With all that laughter, it's sad to say 
That it always ends in the same old way . 
They hug and kiss and say "good-bye" 
And then they all begin to cry! 
I say that 60 percent shed their tears 
But my mom counted 35 people this year. 

She said that boys and men will always have their pride, 
So we won't ever see them cry. 
I myself don't think she's correct, 
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So could you please try this problem to see if you object? 

Exercise 9.16.19 (Solution on p. 409.) 

"The Problem with Angels" by Cyndy Dowling 

Although this problem is wholly mine, 
The catalyst came from the magazine, Time. 
On the magazine cover I did find 
The realm of angels tickling my mind. 

Inside, 69°/. I found to be 
In angels, Americans do believe. 

Then, it was time to rise to the task, 
Ninety-five high school and college students I did ask. 
Viewing all as one group, 
Random sampling to get the scoop. 

So, I asked each to be true, 
"Do you believe in angels?" Tell me, do! 

Hypothesizing at the start, 
Totally believing in my heart 
That the proportion who said yes 
Would be equal on this test . 

Lo and behold, seventy-three did arrive, 
Out of the sample of ninety-five. 
Now your job has just begun, 
Solve this problem and have some fun. 

Exercise 9.16.20 

"Blowing Bubbles" by Sondra Prull 

Studying stats just made me tense, 
I had to find some sane defense. 
Some light and lifting simple play 
To float my math anxiety away. 

Blowing bubbles lifts me high 
Takes my troubles to the sky. 
POIK! They're gone, with all my stress 
Bubble therapy is the best . 

The label said each time I blew 
The average number of bubbles would be at least 22. 
I blew and blew and this I found 
From 64 blows, they all are round! 
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But the number of bubbles in 64 blows 
Varied widely, this I know. 
20 per blow became the mean 
They deviated by 6, and not 16. 

From counting bubbles, I sure did relax 
But now I give to you your task. 
Was 22 a reasonable guess? 
Find the answer and pass this test! 

Exercise 9.16.21 

21. "Dalmatian Darnation" by Kathy Sparling 

A greedy dog breeder named Spreckles 
Bred puppies with numerous freckles 
The Dalmatians he sought 
Possessed spot upon spot 
The more spots, he thought, the more shekels. 

His competitors did not agree 
That freckles would increase the fee. 
They said, ''Spots are quite nice 
But they don't affect price; 
One should breed for improved pedigree.'' 

The breeders decided to prove 
This strategy was a wrong move. 
Breeding only for spots 
Would wreak havoc, they thought. 
His theory they want to disprove. 

They proposed a contest to Spreckles 
Comparing dog prices to freckles. 
In records they looked up 
One hundred one pups : 
Dalmatians that fetched the most shekels. 

They asked Mr. Spreckles to name 
An average spot count he'd claim 
To bring in big bucks . 
Said Spreckles, ''Well, shucks, 
It's for one hundred one that I aim.'' 



(Solution on p. 410.) 



Said an amateur statistician 
Who wanted to help with this mission. 
''Twenty-one for the sample 
Standard deviation's ample: 
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They examined one hundred and one 
Dalmatians that fetched a good sum. 
They counted each spot , 
Mark, freckle and dot 
And tallied up every one. 

Instead of one hundred one spots 
They averaged ninety six dots 
Can they muzzle Spreckles' 
Obsession with freckles 
Based on all the dog data they've got? 

Exercise 9.16.22 

"Macaroni and Cheese, please!!" by Nedda Misherghi and Rachelle Hall 

As a poor starving student I don't have much money to spend for even the bare necessities. So 
my favorite and main staple food is macaroni and cheese. It's high in taste and low in cost and 
nutritional value. 

One day, as I sat down to determine the meaning of life, I got a serious craving for this, oh, so 
important, food of my life. So I went down the street to Greatway to get a box of macaroni and 
cheese, but it was SO expensive! $2.02 !!! Can you believe it? It made me stop and think. The 
world is changing fast. I had thought that the mean cost of a box (the normal size, not some super- 
gigantic-family-value-pack) was at most $1, but now I wasn't so sure. However, I was determined 
to find out. I went to 53 of the closest grocery stores and surveyed the prices of macaroni and 
cheese. Here are the data I wrote in my notebook: 

Price per box of Mac and Cheese: 

• 5 stores @ $2.02 

• 15 stores @ $0.25 

• 3 stores® $1.29 

• 6 stores @ $0.35 

• 4 stores @ $2.27 

• 7 stores® $1.50 

• 5 stores® $1.89 

• 8 stores @ 0.75. 

I could see that the costs varied but I had to sit down to figure out whether or not I was right. If 
it does turn out that this mouth-watering dish is at most $1, then I'll throw a big cheesy party in 
our next statistics lab, with enough macaroni and cheese for just me. (After all, as a poor starving 
student I can't be expected to feed our class of animals!) 

Exercise 9.16.23 (Solution on p. 410.) 

"William Shakespeare: The Tragedy of Hamlet, Prince of Denmark" by Jacqueline Ghodsi 

THE CHARACTERS (in order of appearance): 

• HAMLET, Prince of Denmark and student of Statistics 

• POLONIUS, Hamlet's tutor 

• HOROTIO, friend to Hamlet and fellow student 

Scene: The great library of the castle, in which Hamlet does his lessons 
Act I 



393 



(The day is fair, but the face of Hamlet is clouded. He paces the large room. His tutor, Polonius, is 
reprimanding Hamlet regarding the latter's recent experience. Horatio is seated at the large table 
at right stage.) 

POLONIUS: My Lord, how cans't thou admit that thou hast seen a ghost! It is but a figment of 
your imagination! 

HAMLET: I beg to differ; I know of a certainty that five-and-seventy in one hundred of us, con- 
demned to the whips and scorns of time as we are, have gazed upon a spirit of health, or goblin 
damn'd, be their intents wicked or charitable. 

POLONIUS If thou doest insist upon thy wretched vision then let me invest your time; be true 
to thy work and speak to me through the reason of the null and alternate hypotheses. (He turns 
to Horatio.) Did not Hamlet himself say, "What piece of work is man, how noble in reason, how 
infinite in faculties? Then let not this foolishness persist. Go, Horatio, make a survey of three-and- 
sixty and discover what the true proportion be. For my part, I will never succumb to this fantasy, 
but deem man to be devoid of all reason should thy proposal of at least five-and-seventy in one 
hundred hold true. 

HORATIO (to Hamlet): What should we do, my Lord? 

HAMLET: Go to thy purpose, Horatio. 

HORATIO: To what end, my Lord? 

HAMLET: That you must teach me. But let me conjure you by the rights of our fellowship, by the 
consonance of our youth, but the obligation of our ever-preserved love, be even and direct with 
me, whether I am right or no. 

(Horatio exits, followed by Polonius, leaving Hamlet to ponder alone.) 

Act II 

(The next day, Hamlet awaits anxiously the presence of his friend, Horatio. Polonius enters and 
places some books upon the table just a moment before Horatio enters.) 

POLONIUS: So, Horatio, what is it thou didst reveal through thy deliberations? 

HORATIO: In a random survey, for which purpose thou thyself sent me forth, I did discover that 
one-and-forty believe fervently that the spirits of the dead walk with us. Before my God, I might 
not this believe, without the sensible and true avouch of mine own eyes. 

POLONIUS: Give thine own thoughts no tongue, Horatio. (Polonius turns to Hamlet.) But look 
to't I charge you, my Lord. Come Horatio, let us go together, for this is not our test. (Horatio and 
Polonius leave together.) 

HAMLET: To reject, or not reject, that is the question: whether 'tis nobler in the mind to suffer the 
slings and arrows of outrageous statistics, or to take arms against a sea of data, and, by opposing, 
end them. (Hamlet resignedly attends to his task.) 

(Curtain falls) 

Exercise 9.16.24 

"Untitled" by Stephen Chen 
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I've often wondered how software is released and sold to the public. Ironically, I work for a com- 
pany that sells products with known problems. Unfortunately, most of the problems are difficult 
to create, which makes them difficult to fix. I usually use the test program X, which tests the prod- 
uct, to try to create a specific problem. When the test program is run to make an error occur, the 
likelihood of generating an error is 1%. 

So, armed with this knowledge, I wrote a new test program Y that will generate the same error that 
test program X creates, but more often. To find out if my test program is better than the original, 
so that I can convince the management that I'm right, I ran my test program to find out how often 
I can generate the same error. When I ran my test program 50 times, I generated the error twice. 
While this may not seem much better, I think that I can convince the management to use my test 
program instead of the original test program. Am I right? 

Exercise 9.16.25 (Solution on p. 410.) 

Japanese Girls' Names 

by Kumi Furuichi 

It used to be very typical for Japanese girls' names to end with "ko." (The trend might have 
started around my grandmothers' generation and its peak might have been around my mother's 
generation.) "Ko" means "child" in Chinese character. Parents would name their daughters with 
"ko" attaching to other Chinese characters which have meanings that they want their daughters 
to become, such as Sachiko - a happy child, Yoshiko - a good child, Yasuko - a healthy child, and 
so on. 

However, I noticed recently that only two out of nine of my Japanese girlfriends at this school have 
names which end with "ko." More and more, parents seem to have become creative, modernized, 
and, sometimes, westernized in naming their children. 

I have a feeling that, while 70 percent or more of my mother 's generation would have names with 
"ko" at the end, the proportion has dropped among my peers. I wrote down all my Japanese 
friends', ex-classmates', co-workers, and acquaintances' names that I could remember. Below are 
the names. (Some are repeats.) Test to see if the proportion has dropped for this generation. 

Ai, Akemi, Akiko, Ayumi, Chiaki, Chie, Eiko, Eri, Eriko, Fumiko, Harumi, Hitomi, Hiroko, Hi- 
roko, Hidemi, Hisako, Hinako, Izumi, Izumi, Junko, Junko, Kana, Kanako, Kanayo, Kayo, Kayoko, 
Kazumi, Keiko, Keiko, Kei, Kumi, Kumiko, Kyoko, Kyoko, Madoka, Maho, Mai, Maiko, Maki, 
Miki, Miki, Mikiko, Mina, Minako, Miyako, Momoko, Nana, Naoko, Naoko, Naoko, Noriko, 
Rieko, Rika, Rika, Rumiko, Rei, Reiko, Reiko, Sachiko, Sachiko, Sachiyo, Saki, Sayaka, Sayoko, 
Sayuri, Seiko, Shiho, Shizuka, Sumiko, Takako, Takako, Tomoe, Tomoe, Tomoko, Touko, Yasuko, 
Yasuko, Yasuyo, Yoko, Yoko, Yoko, Yoshiko, Yoshiko, Yoshiko, Yuka, Yuki, Yuki, Yukiko, Yuko, 
Yuko. 

Exercise 9.16.26 

Phillip's Wish by Suzanne Osorio 

My nephew likes to play 
Chasing the girls makes his day. 
He asked his mother 
If it is okay 
To get his ear pierced. 
She said, "No way!" 
To poke a hole through your ear, 
Is not what I want for you, dear. 
He argued his point quite well, 
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Says even my macho pal, Mel, 

Has gotten this done. 

It's all just for fun. 

C'mon please, mom, please, what the hell. 

Again Phillip complained to his mother, 

Saying half his friends (including their brothers) 

Are piercing their ears 

And they have no fears 

He wants to be like the others. 

She said, ''I think it's much less. 

We must do a hypothesis test . 

And if you are right , 

I won ' t put up a f ight . 

But , if not , then my case will rest . ' ' 

We proceeded to call fifty guys 

To see whose prediction would fly. 

Nineteen of the fifty 

Said piercing was nifty 

And earrings they'd occasionally buy. 

Then there's the other thirty-one, 

Who said they'd never have this done. 

So now this poem's finished. 

Will his hopes be diminished, 

Or will my nephew have his fun? 

Exercise 9.16.27 

The Craven by Mark Salangsang 

Once upon a morning dreary 
In stats class I was weak and weary. 
Pondering over last night ' s homework 
Whose answers were now on the board 
This I did and nothing more. 

While I nodded nearly napping 
Suddenly, there came a tapping. 
As someone gently rapping, 
Rapping my head as I snore . 
Quoth the teacher, ''Sleep no more.'' 

''In every class you fall asleep,'' 
The teacher said, his voice was deep. 
''So a tally I've begun to keep 
Of every class you nap and snore. 
The percentage being forty-four.'' 



(Solution on p. 410.) 



''My dear teacher I must confess, 
While sleeping is what I do best . 
The percentage, I think, must be less, 
A percentage less than forty-four.'' 
This I said and nothing more. 
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''We'll see,'' he said and walked away, 
And fifty classes from that day 
He counted till the month of May 
The classes in which I napped and snored. 
The number he found was twenty-four. 

At a significance level of 0.05, 
Please tell me am I still alive? 
Or did my grade just take a dive 
Plunging down beneath the floor? 
Upon thee I hereby implore . 

Exercise 9.16.28 

Toastmasters International cites a report by Gallop Poll that 40% of Americans fear public 
speaking. A student believes that less than 40% of students at her school fear public speaking. 
She randomly surveys 361 schoolmates and finds that 135 report they fear public speaking. 
Conduct a hypothesis test to determine if the percent at her school is less than 40%. (Source: 
http://toastmasters.org/artisan/detail.asp?CategoryID=l&SubCategoryID=10&ArticleID=429&Page=l 
) 

Exercise 9.16.29 (Solution on p. 410.) 

68% of online courses taught at community colleges nationwide were taught by full-time faculty. 
To test if 68% also represents California's percent for full-time faculty teaching the online classes, 
Long Beach City College (LBCC), CA, was randomly selected for comparison. In the same year, 34 
of the 44 online courses LBCC offered were taught by full-time faculty. Conduct a hypothesis test 
to determine if 68% represents CA. NOTE: For more accurate results, use more CA community 
colleges and this past year's data. (Sources: Growing by Degrees by Allen and Seaman; Amit 
Schitai, Director of Instructional Technology and Distance Learning, LBCC). 

Exercise 9.16.30 

According to an article in Bloomberg Businessweek, New York City's most recent adult smoking 
rate is 14%. Suppose that a survey is conducted to determine this year's rate. Nine out of 70 ran- 
domly chosen N.Y. City residents reply that they smoke. Conduct a hypothesis test to determine if 
the rate is still 14% or if it has decreased. (Source: http://www.businessweek.com/news/2011- 
09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloomberg-says.html ls ) 

Exercise 9.16.31 (Solution on p. 410.) 

The mean age of De Anza College students in a previous term was 26.6 years old. An instructor 
thinks the mean age for online students is older than 26.6. She randomly surveys 56 online stu- 
dents and finds that the sample mean is 29.4 with a standard deviation of 2.1. Conduct a hypoth- 
esis test. (Source: http://research.fhda.edu/ factbook/DAdemofs/Fact_sheet_da_2006w.pdf w ) 

Exercise 9.16.32 

Registered nurses earned an average annual salary of $69,110. For that same year, a survey 
was conducted of 41 California registered nurses to determine if the annual salary is higher than 
$69,110 for California nurses. The sample average was $71,121 with a sample standard deviation of 
$7,489. Conduct a hypothesis test. (Source: http://www.bls.gov/oes/current/oes291111.htm 20 
) 
Exercise 9.16.33 (Solution on p. 410.) 

La Leche League International reports that the mean age of weaning a child from breastfeeding 
is age 4 to 5 worldwide. In America, most nursing mothers wean their children much earlier. 

17 http:/ /toastmasters. org/artis an/detail. asp?CategoryID=l&SubCategoryID=10&ArticleID=429&Page=l 
18 http://www.business^veekxorn/news/2011-09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloornberg-says.html 
19 http://research.fhda.edu/factbook/DAdemofs/Fact_sheet_da_2006w.pdf 
20 http:// www.bls.gov/oes/current/oes291111.htm 
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Suppose a random survey is conducted of 21 U.S. mothers who recently weaned their children. 
The mean weaning age was 9 months (3/4 year) with a standard deviation of 4 months. Conduct 
a hypothesis test to determine if the mean weaning age in the U.S. is less than 4 years old. (Source: 
http://www.lalecheleague.org/Law/BAFeb01 .html 21 ) 



9.16.1 Try these multiple choice questions. 

Exercise 9.16.34 (Solution on p. 410.) 

When a new drug is created, the pharmaceutical company must subject it to testing before receiv- 
ing the necessary permission from the Food and Drug Administration (FDA) to market the drug. 
Suppose the null hypothesis is "the drug is unsafe." What is the Type II Error? 

A. To conclude the drug is safe when in, fact, it is unsafe 

B. To not conclude the drug is safe when, in fact, it is safe. 

C. To conclude the drug is safe when, in fact, it is safe. 

D. To not conclude the drug is unsafe when, in fact, it is unsafe 

The next two questions refer to the following information: Over the past few decades, public health 
officials have examined the link between weight concerns and teen girls smoking. Researchers surveyed a 
group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four 
years the girls were surveyed again. Sixty-three (63) said they smoked to stay thin. Is there good evidence 
that more than thirty percent of the teen girls smoke to stay thin? 

Exercise 9.16.35 (Solution on p. 410.) 

The alternate hypothesis is 

A. p < 0.30 

B. p < 0.30 

C. p > 0.30 

D. p > 0.30 

Exercise 9.16.36 (Solution on p. 410.) 

After conducting the test, your decision and conclusion are 

A. Reject H : There is sufficient evidence to conclude that more than 30% of teen girls smoke to 

stay thin. 

B. Do not reject H : There is not sufficient evidence to conclude that less than 30% of teen girls 

smoke to stay thin. 

C. Do not reject H : There is not sufficient evidence to conclude that more than 30% of teen girls 

smoke to stay thin. 

D. Reject H : There is sufficient evidence to conclude that less than 30% of teen girls smoke to 

stay thin. 

The next three questions refer to the following information: A statistics instructor believes that fewer 
than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of 
the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of attended the midnight 
showing. 

Exercise 9.16.37 (Solution on p. 411.) 

An appropriate alternative hypothesis is 



1 http: / / www.lalecheleague.org/Law/BAFeb01 .html 
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A. p = 0.20 

B. p > 0.20 

C. p < 0.20 

D. p < 0.20 

Exercise 9.16.38 (Solution on p. 411.) 

At a 1% level of significance, an appropriate conclusion is: 

A. There is insufficient evidence to conclude that the percent of EVC students that attended the 

midnight showing of Harry Potter is less than 20%. 

B. There is sufficient evidence to conclude that the percent of EVC students that attended the 

midnight showing of Harry Potter is more than 20%. 

C. There is sufficient evidence to conclude that the percent of EVC students that attended the 

midnight showing of Harry Potter is less than 20%. 

D. There is insufficient evidence to conclude that the percent of EVC students that attended the 

midnight showing of Harry Potter is at least 20%. 

Exercise 9.16.39 (Solution on p. 411.) 

The Type I error is to conclude that the percent of EVC students who attended is 

A. at least 20%, when in fact, it is less than 20%. 

B. 20%, when in fact, it is 20%. 

C. less than 20%, when in fact, it is at least 20%. 

D. less than 20%, when in fact, it is less than 20%. 

The next two questions refer to the following information: 

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than 7 
hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a 
mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC 
Intermediate Algebra students get less than 7 hours of sleep per night, on average? 

Exercise 9.16.40 (Solution on p. 411.) 

The distribution to be used for this test is X ~ 

A. N(V.24, ^g 

B. N (7.24, 1.93) 

C. t 22 

D. t 21 

Exercise 9.16.41 (Solution on p. 411.) 

The Type II error is to not reject that the mean number of hours of sleep LTCC students get per 
night is at least 7 when, in fact, the mean number of hours 

A. is more than 7 hours. 

B. is at most 7 hours. 

C. is at least 7 hours. 

D. is less than 7 hours. 

The next three questions refer to the following information: Previously, an organization reported that 
teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, 
the mean is higher. Fifteen (15) randomly chosen teenagers were asked how many hours per week they 
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spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a 
hypothesis test. 

Exercise 9.16.42 (Solution on p. 411.) 

The null and alternate hypotheses are: 

A. H : x = 4.5, H a : x > 4.5 

B. H : ]i > 4.5 H a : y. < 4.5 

C. H :}i = 4.75 H a -.]i > 4.75 

D. H : ji = 4.5 H a : ji > 4.5 

Exercise 9.16.43 (Solution on p. 411.) 

At a significance level of a = 0.05, what is the correct conclusion? 

A. There is enough evidence to conclude that the mean number of hours is more than 4.75 

B. There is enough evidence to conclude that the mean number of hours is more than 4.5 

C. There is not enough evidence to conclude that the mean number of hours is more than 4.5 

D. There is not enough evidence to conclude that the mean number of hours is more than 4.75 

Exercise 9.16.44 (Solution on p. 411.) 

The Type I error is: 

A. To conclude that the current mean hours per week is higher than 4.5, when in fact, it is higher. 

B. To conclude that the current mean hours per week is higher than 4.5, when in fact, it is the 

same. 

C. To conclude that the mean hours per week currently is 4.5, when in fact, it is higher. 

D. To conclude that the mean hours per week currently is no higher than 4.5, when in fact, it is 

not higher. 
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Exercise 9.17.1 (Solution on p. 411.) 

Rebecca and Matt are 14 year old twins. Matt's height is 2 standard deviations below the mean 
for 14 year old boys' height. Rebecca's height is 0.10 standard deviations above the mean for 14 
year old girls' height. Interpret this. 

A. Matt is 2.1 inches shorter than Rebecca 

B. Rebecca is very tall compared to other 14 year old girls. 

C. Rebecca is taller than Matt. 

D. Matt is shorter than the average 14 year old boy. 

Exercise 9.17.2 (Solution on p. 411.) 

Construct a histogram of the IPO data (see Table of Contents, 14. Appendix, Data Sets). Use 5 
intervals. 

The next three exercises refer to the following information: Ninety homeowners were asked the number 
of estimates they obtained before having their homes fumigated. X = the number of estimates. 



X 


Rel. Freq. 


Cumulative Rel. Freq. 


1 


0.3 




2 


0.2 




4 


0.4 




5 


0.1 





Table 9.5 

Complete the cumulative relative frequency column. 

Exercise 9.17.3 (Solution on p. 411.) 

Calculate the sample mean (a), the sample standard deviation (b) and the percent of the estimates 
that fall at or below 4 (c). 

Exercise 9.17.4 (Solution on p. 411.) 

Calculate the median, M, the first quartile, Ql, the third quartile, Q3. Then construct a boxplot of 
the data. 



Exercise 9.17.5 

The middle 50% of the data are between 



(Solution on p. 411.) 



and 



The next three questions refer to the following table: Seventy 5th and 6th graders were asked their favorite 
dinner. 





Pizza 


Hamburgers 


Spaghetti 


Fried shrimp 


5th grader 


15 


6 


9 





6th grader 


15 


7 


10 


8 



Table 9.6 

Exercise 9.17.6 (Solution on p. 411.) 

Find the probability that one randomly chosen child is in the 6th grade and prefers fried shrimp. 



2 This content is available online at <http://cnx.Org/content/ml7013/l.12/>. 
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A 22 
B. ^ 

c § 

*" 8 

D ^ 
v. 70 



Exercise 9.17.7 (Solution on p. 411.) 

Find the probability that a child does not prefer pizza. 



A 30 
R 30 

r 40 

*" 70 

D. 1 



Exercise 9.17.8 (Solution on p. 411.) 

Find the probability a child is in the 5th grade given that the child prefers spaghetti. 



A. 
B. 
C. 
D. 



_9_ 
19 
_9_ 
70 
_9_ 
30 
19 
70 



Exercise 9.17.9 (Solution on p. 411.) 

A sample of convenience is a random sample. 

A. true 

B. false 

Exercise 9.17.10 (Solution on p. 411.) 

A statistic is a number that is a property of the population. 

A. true 

B. false 

Exercise 9.17.11 (Solution on p. 411.) 

You should always throw out any data that are outliers. 

A. true 

B. false 

Exercise 9.17.12 (Solution on p. 411.) 

Lee bakes pies for a small restaurant in Felton, CA. She generally bakes 20 pies in a day, on the 
average. Of interest is the num.ber of pies she bakes each day 

a. Define the Random Variable X. 

b. State the distribution for X. 

c. Find the probability that Lee bakes more than 25 pies in any given day. 

Exercise 9.17.13 (Solution on p. 412.) 

Six different brands of Italian salad dressing were randomly selected at a supermarket. The grams 
of fat per serving are 7, 7, 9, 6, 8, 5. Assume that the underlying distribution is normal. Calculate a 
95% confidence interval for the population mean grams of fat per serving of Italian salad dressing 
sold in supermarkets. 

Exercise 9.17.14 (Solution on p. 412.) 

Given: uniform, exponential, normal distributions. Match each to a statement below. 
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a. mean = median 7^ mode 

b. mean > median > mode 

c. mean = median = mode 
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9.18 Lab: Hypothesis Testing of a Single Mean and Single Proportion 

Class Time: 
Names: 

9.18.1 Student Learning Outcomes: 

• The student will select the appropriate distributions to use in each case. 

• The student will conduct hypothesis tests and interpret the results. 



9.18.2 Television Survey 

In a recent survey, it was stated that Americans watch television on average four hours per day. Assume 
that u = 2. Using your class as the sample, conduct a hypothesis test to determine if the average for 
students at your school is lower. 

1. H : 

2. H a : 

3. In words, define the random variable. = 

4. The distribution to use for the test is: 



5. Determine the test statistic using your data. 

6. Draw a graph and label it appropriately. Shade the actual level of significance. 

a. Graph: 



3 This content is available online at <http://cnx.Org/content/ml7007/l.12/>. 
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Figure 9.9 



b. Determine the p-value: 

7. Do you or do you not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 



9.18.3 Language Survey 

About 42.3% of Californians and 19.6% of all Americans over age 5 speak a language other than English 
at home. Using your class as the sample, conduct a hypothesis test to determine if the percent of the 
students at your school that speak a language other than English at home is different from 42.3%. (Source: 
http://www.census.gov/hhes/socdemo/language/ 24 ) 

1. H : 

2. H a : 

3. In words, define the random variable. = 

4. The distribution to use for the test is: 

5. Determine the test statistic using your data. 

6. Draw a graph and label it appropriately. Shade the actual level of significance. 



4 http://cnx.org/content/ml7007/latest/ http://www.census.gov/hhes/socdemo/language/ 
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a. Graph: 



Figure 9.10 

b. Determine the p-value: 

7. Do you or do you not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 



9.18.4 Jeans Survey 

Suppose that young adults own an average of 3 pairs of jeans. Survey 8 people from your class to determine 
if the average is higher than 3. 



H : 

H a : 

In words, define the random variable. 

The distribution to use for the test is: 



Determine the test statistic using your data. 
6. Draw a graph and label it appropriately. Shade the actual level of significance. 

a. Graph: 
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Figure 9.11 



b. Determine the p-value: 

7. Do you or do you not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 
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Solutions to Exercises in Chapter 9 

Solutions to Practice 1: Single Mean, Known Population Standard Deviation 

Solution to Exercise 9.13.1 (p. 381) 
Means 
Solution to Exercise 9.13.2 (p. 381) 

a: H : ji = 2. 5 (or, H : ji < 2.5) 
b: H a : ji > 2.5 

Solution to Exercise 9.13.3 (p. 381) 

right-tailed 
Solution to Exercise 9.13.4 (p. 381) 

X 
Solution to Exercise 9.13.5 (p. 381) 

The mean time spent in jail for 26 first time convicted burglars 
Solution to Exercise 9.13.6 (p. 381) 

Yes, 1.5 

Solution to Exercise 9.13.7 (p. 381) 

a. 3 

b. 1.5 

c. 1.8 

d. 26 

Solution to Exercise 9.13.8 (p. 381) 
a 
Solution to Exercise 9.13.9 (p. 381) 

*~ N ( Z5 't!) 

Solution to Exercise 9.13.11 (p. 382) 

0.0446 

Solution to Exercise 9.13.12 (p. 382) 

a. Reject the null hypothesis 

Solutions to Practice 2: Single Mean, Unknown Population Standard Deviation 

Solution to Exercise 9.14.1 (p. 383) 

averages 

Solution to Exercise 9.14.2 (p. 383) 

a. H : ]i — 15 

b. H a : ]i £ 15 

Solution to Exercise 9.14.3 (p. 383) 

two-tailed 

Solution to Exercise 9.14.4 (p. 383) 

X 
Solution to Exercise 9.14.5 (p. 383) 

the mean time spent on death row for the 26 inmates 
Solution to Exercise 9.14.6 (p. 383) 

No 

Solution to Exercise 9.14.7 (p. 383) 
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a. 17.4 

b. s 

c. 75 

Solution to Exercise 9.14.8 (p. 383) 
f-test 
Solution to Exercise 9.14.9 (p. 383) 

hi 

Solution to Exercise 9.14.11 (p. 384) 
0.0015 
Solution to Exercise 9.14.12 (p. 384) 

a. Reject the null hypothesis 

Solutions to Practice 3: Single Proportion 

Solution to Exercise 9.15.1 (p. 385) 
Proportions 
Solution to Exercise 9.15.2 (p. 385) 

a. H : p = 0.095 

b. H a : p < 0.095 

Solution to Exercise 9.15.3 (p. 385) 
left-tailed 

Solution to Exercise 9.15.4 (p. 385) 
P' 

Solution to Exercise 9.15.5 (p. 385) 

the proportion of people in that town surveyed suffering from depression or a depressive illness 
Solution to Exercise 9.15.6 (p. 385) 

a. 7 

b. 100 

c. 0.07 

Solution to Exercise 9.15.7 (p. 385) 
0.0293 

Solution to Exercise 9.15.8 (p. 385) 
Normal 

Solution to Exercise 9.15.10 (p. 386) 
0.1969 
Solution to Exercise 9.15.11 (p. 386) 

a. Do not reject the null hypothesis 

Solutions to Homework 
Solution to Exercise 9.16.1 (p. 387) 

a. H : fi = 34 ; H a : ji / 34 

c. H : ]i > 100,000 ;H a :}i< 100,000 

d. H : p = 0.29 ;H a :p^ 0.29 
g. H :p = 0.50 ;H a :p^ 0.50 
i. H :p> 0.11 ;H a :p< 0.11 
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Solution to Exercise 9.16.2 (p. 387) 

a. Type I error: We conclude that the mean is not 34 years, when it really is 34 years. Type II error: We do 
not conclude that the mean is not 34 years, when it is not really 34 years. 

c. Type I error: We conclude that the mean is less than $100,000, when it really is at least $100,000. Type II 

error: We do not conclude that the mean is less than $100,000, when it is really less than $100,000. 

d. Type I error: We conclude that the proportion of h.s. seniors who get drunk each month is not 29%, 

when it really is 29%. Type II error: We do not conclude that the proportion of h.s. seniors that get 
drunk each month is not 29%, when it is really not 29%. 
i. Type I error: We conclude that the proportion is less than 11%, when it is really at least 11%. Type II error: 
We do not conclude that the proportion is less than 11%, when it really is less than 11%. 

Solution to Exercise 9.16.5 (p. 387) 

e. z = -2.71 

f. 0.0034 

h. Decision: Reject null; Conclusion: pi < 19 
i. (17.449,18.757) 

Solution to Exercise 9.16.7 (p. 388) 

e. 3.5 

f. 0.0005 

h. Decision: Reject null; Conclusion: pi > 4.5 
i. (4.7553,5.4447) 

Solution to Exercise 9.16.9 (p. 388) 

e. 2.7 

f. 0.0042 

h. Decision: Reject Null 
i. (80.789,85.211) 

Solution to Exercise 9.16.11 (p. 388) 

d. t n 

e. 1.96 

f. 0.0380 

h. Decision: Reject null when a = 0.05 ; do not reject null when a = 0.01 
i. (3.8865,5.9468) 

Solution to Exercise 9.16.13 (p. 388) 

e. -1.64 

f. 0.1000 

h. Decision: Do not reject null 
i. (0.3216,0.4784) 

Solution to Exercise 9.16.15 (p. 389) 

d. t 9 

e. -1.33 

f. 0.1086 

h. Decision: Do not reject null 
i. (51.886,62.114) 

Solution to Exercise 9.16.19 (p. 390) 
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e. 1.65 

f. 0.0984 

h. Decision: Do not reject null 
i. (0.6836,0.8533) 

Solution to Exercise 9.16.21 (p. 391) 

e. -2.39 

f. 0.0093 

h. Decision: Reject null 
i. (91.854,100.15) 

Solution to Exercise 9.16.23 (p. 392) 

e. -1.82 

f. 0.0345 

h. Decision: Do not reject null 
i. (0.5331,0.7685) 

Solution to Exercise 9.16.25 (p. 394) 

e. z = -2.99 

f. 0.0014 

h. Decision: Reject null; Conclusion: p < .70 
i. (0.4529,0.6582) 

Solution to Exercise 9.16.27 (p. 395) 

e. 0.57 

f. 0.7156 

h. Decision: Do not reject null 
i. (0.3415,0.6185) 

Solution to Exercise 9.16.29 (p. 396) 

e. 1.32 

f. 0.1873 

h. Decision: Do not reject null 
i. (0.65,0.90) 

Solution to Exercise 9.16.31 (p. 396) 

e. 9.98 

f. 0.0000 

h. Decision: Reject null 
i. (28.8,30.0) 

Solution to Exercise 9.16.33 (p. 396) 

e. -44.7 

f. 0.0000 

h. Decision: Reject null 
i. (0.60,0.90) -in years 

Solution to Exercise 9.16.34 (p. 397) 

B 

Solution to Exercise 9.16.35 (p. 397) 

D 
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Solution to Exercise 9.16.36 (p. 397) 

C 

Solution to Exercise 9.16.37 (p. 397) 

C 

Solution to Exercise 9.16.38 (p. 398) 

A 

Solution to Exercise 9.16.39 (p. 398) 

C 

Solution to Exercise 9.16.40 (p. 398) 

D 

Solution to Exercise 9.16.41 (p. 398) 

D 

Solution to Exercise 9.16.42 (p. 399) 

D 

Solution to Exercise 9.16.43 (p. 399) 

C 

Solution to Exercise 9.16.44 (p. 399) 

B 



Solutions to Review 

Solution to Exercise 9.17.1 (p. 400) 
D 
Solution to Exercise 9.17.2 (p. 400) 

No solution provided. There are several ways in which the histogram could be constructed. 
Solution to Exercise 9.17.3 (p. 400) 

a. 2.8 

b. 1.48 

c. 90% 

Solution to Exercise 9.17.4 (p. 400) 



M = 3;Q1 = 1; 



Q3 



Solution to Exercise 9.17.5 (p. 400) 

land 4 
Solution to Exercise 9.17.6 (p. 400) 

D 
Solution to Exercise 9.17.7 (p. 401) 

C 
Solution to Exercise 9.17.8 (p. 401) 

A 
Solution to Exercise 9.17.9 (p. 401) 

B 
Solution to Exercise 9.17.10 (p. 401) 

B 
Solution to Exercise 9.17.11 (p. 401) 

B 
Solution to Exercise 9.17.12 (p. 401) 

b. P(20) 

c. 0.1122 
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Solution to Exercise 9.17.13 (p. 401) 

CI: (5.52,8.48) 

Solution to Exercise 9.17.14 (p. 401) 

a. uniform 

b. exponential 

c. normal 



Chapter 10 

Hypothesis Testing: Two Means, Paired 
Data, Two Proportions 



10.1 Hypothesis Testing: Two Population Means and Two Population 
Proportions 1 

10.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Classify hypothesis tests by type. 

• Conduct and interpret hypothesis tests for two population means, population standard deviations 
known. 

• Conduct and interpret hypothesis tests for two population means, population standard deviations 
unknown. 

• Conduct and interpret hypothesis tests for two population proportions. 

• Conduct and interpret hypothesis tests for matched or paired samples. 

10.1.2 Introduction 

Studies often compare two groups. For example, researchers are interested in the effect aspirin has in 
preventing heart attacks. Over the last few years, newspapers and magazines have reported about various 
aspirin studies involving two groups. Typically, one group is given aspirin and the other group is given a 
placebo. Then, the heart attack rate is studied over several years. 

There are other situations that deal with the comparison of two groups. For example, studies compare var- 
ious diet and exercise programs. Politicians compare the proportion of individuals from different income 
brackets who might vote for them. Students are interested in whether SAT or GRE preparatory courses 
really help raise their scores. 

In the previous chapter, you learned to conduct hypothesis tests on single means and single proportions. 
You will expand upon that in this chapter. You will compare two means or two proportions to each other. 
The general procedure is still the same, just expanded. 



lr rhis content is available online at <http://cnx.org/content/ml7029/1.9/>. 
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To compare two means or two proportions, you work with two groups. The groups are classified either as 
independent or matched pairs. Independent groups mean that the two samples taken are independent, 
that is, sample values selected from one population are not related in any way to sample values selected 
from the other population. Matched pairs consist of two samples that are dependent. The parameter tested 
using matched pairs is the population mean. The parameters tested using independent groups are either 
population means or population proportions. 

NOTE: This chapter relies on either a calculator or a computer to calculate the degrees of freedom, 
the test statistics, and p-values. TI-83+ and TI-84 instructions are included as well as the test statis- 
tic formulas. When using the TI-83+/TI-84 calculators, we do not need to separate two population 
means, independent groups, population variances unknown into large and small sample sizes. 
However, most statistical computer software has the ability to differentiate these tests. 

This chapter deals with the following hypothesis tests: 
Independent groups (samples are independent) 

• Test of two population means. 

• Test of two population proportions. 

Matched or paired samples (samples are dependent) 

• Becomes a test of one population mean. 



10.2 Comparing Two Independent Population Means with Unknown 
Population Standard Deviations 2 

1. The two independent samples are simple random samples from two distinct populations. 

2. Both populations are normally distributed with the population means and standard deviations un- 
known unless the sample sizes are greater than 30. In that case, the populations need not be normally 
distributed. 

NOTE: The test comparing two independent population means with unknown and possibly un- 
equal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom 
formula was developed by Aspin-Welch. 

The comparison of two population means is very common. A difference between the two samples depends 
on both the means and the standard deviations. Very different means can occur by chance if there is great 
variation among the individual samples. In order to account for the variation, we take the difference of 
the sample means, X\ - X2 , and divide by the standard error (shown below) in order to standardize the 
difference. The result is a t-score test statistic (shown below). 

Because we do not know the population standard deviations, we estimate them using the two sample 
standard deviations from our independent samples. For the hypothesis test, we calculate the estimated 
standard deviation, or standard error, of the difference in sample means, X^ - X2. 

The standard error is: 

(Si£ + (S2£ (101) 

ti\ ri2 

The test statistic (t-score) is calculated as follows: 



2 This content is available online at <http://cnx.Org/content/ml7025/l.18/>. 
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t-score 



(x-i -x 2 ) - Oi -]i 2 ) 



(10.2) 



(Si 



»i 



+ 



(SzT 

"2 



where: 

• Sj and S2/ the sample standard deviations, are estimates of <T\ and c 2 , respectively. 

• C\ and (72 are the unknown population standard deviations. 

• x~{ and x~2 are the sample means. \l\ and ^2 ar e the population means. 

The degrees of freedom (df) is a somewhat complicated calculation. However, a computer or calculator cal- 
culates it easily. The dfs are not always a whole number. The test statistic calculated above is approximated 
by the student's-t distribution with dfs as follows: 

Degrees of freedom 



df 



far , fa) 



+ 



111 



«!-l 



faT 
"1 



+ 



«2 — 1 



faT 
"2 



(10.3) 



When both sample sizes ri\ and M2 are five or larger, the student's-t approximation is very good. Notice that 
the sample variances Sj 2 and S2 2 are not pooled. (If the question comes up, do not pool the variances.) 

NOTE: It is not necessary to compute this by hand. A calculator or computer easily computes it. 

Example 10.1: Independent groups 

The average amount of time boys and girls ages 7 through 11 spend playing sports each day is 
believed to be the same. An experiment is done, data is collected, resulting in the table below. 
Both populations have a normal distribution. 





Sample Size 


Average Number of 

Hours Playing Sports 

Per Day 


Sample Standard 
Deviation 


Girls 


9 


2 hours 


V0.75 


Boys 


16 


3.2 hours 


1.00 



Table 10.1 

Problem 

Is there a difference in the mean amount of time boys and girls ages 7 through 1 1 play sports each 
day? Test at the 5% level of significance. 

Solution 

The population standard deviations are not known. Let g be the subscript for girls and b be the 
subscript for boys. Then, ji„ is the population mean for girls and }i b is the population mean for 
boys. This is a test of two independent groups, two population means. 

Random variable: X ? — Xj, = difference in the sample mean amount of time girls and boys play 
sports each day. 



H : jig — fi b 



ji g -li b = 
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The words "the same" tell you H has an "=". Since there are no other words to indicate H„, then 
assume "is different." This is a two-tailed test. 

Distribution for the test: Use t&f where df is calculated using the df formula for independent 
groups, two population means. Using a calculator, df is approximately 18.8462. Do not pool the 
variances. 

Calculate the p-value using a student's-t distribution: p-value = 0.0054 

Graph: 



- (p-value) - 0.0028 



- (p-value) = 0.0028 
2 




x„- x 



-1.2 1.2 

From H , \x g - \ib = d 



g~ A b 



Figure 10.1 



v / 075 



s h = 1 



So, Xa — Xf, = 2 — 3.2 



-1.2 



Half the p-value is below -1.2 and half is above 1.2. 

Make a decision: Since a. > p-value, reject H . 

This means you reject jig — ]i\>. The means are different. 

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to 
conclude that the mean number of hours that girls and boys aged 7 through 1 1 play sports per day 
is different (mean number of hours boys aged 7 through 11 play sports per day is greater than the 
mean number of hours played by girls OR the mean number of hours girls aged 7 through 1 1 play 
sports per day is greater than the mean number of hours played by boys). 

NOTE: TI-83+ and TI-84: Press STAT. Arrow over to TESTS and press 4 : 2-SampTTest. Arrow over 
to Stats and press ENTER. Arrow down and enter 2 for the first sample mean, \/0.75 for Sxl, 9 
for nl, 3 . 2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to jil: and arrow 
to does not equal }i2. Press ENTER. Arrow down to Pooled: and No. Press ENTER. Arrow down to 
Calculate and press ENTER. The p-value is p = 0.0054, the dfs are approximately 18.8462, and the 
test statistic is -3.14. Do the procedure again but instead of Calculate do Draw. 
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Example 10.2 

A study is done by a community group in two neighboring colleges to determine which one grad- 
uates students with more math classes. College A samples 11 graduates. Their average is 4 math 
classes with a standard deviation of 1.5 math classes. College B samples 9 graduates. Their aver- 
age is 3.5 math classes with a standard deviation of 1 math class. The community group believes 
that a student who graduates from college A has taken more math classes, on the average. Both 
populations have a normal distribution. Test at a 1% significance level. Answer the following 
questions. 

Problem 1 (Solution on p. 450.) 

Is this a test of two means or two proportions? 



Problem 2 

Are the populations standard deviations known or unknown? 

Problem 3 

Which distribution do you use to perform the test? 

Problem 4 

What is the random variable? 



(Solution on p. 450.) 
(Solution on p. 450.) 
(Solution on p. 450.) 



Problem 5 

What are the null and alternate hypothesis? 

Problem 6 

Is this test right, left, or two tailed? 

Problem 7 

What is the p-value? 

Problem 8 

Do you reject or not reject the null hypothesis? 

Conclusion: 

At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude 
that a student who graduates from college A has taken more math classes, on the average, than a 
student who graduates from college B. 



(Solution on p. 450.) 
(Solution on p. 450.) 
(Solution on p. 450.) 
(Solution on p. 450.) 



10.3 Comparing Two Independent Population Means with Known Pop- 
ulation Standard Deviations 3 

Even though this situation is not likely (knowing the population standard deviations is not likely), the 
following example illustrates hypothesis testing for independent means, known population standard de- 
viations. The sampling distribution for the difference between the means is normal and both populations 
must be normal. The random variable is X\ — X 2 . The normal distribution has the following format: 

Normal distribution 



Xi — X? 



N 



u x - u 2 , 



(if , to) 



"i 



+ 



n 2 



(10.4) 



3 This content is available online at <http://cnx.Org/content/ml7042/l.10/>. 
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The standard deviation is: 



The test statistic (z-score) is: 






z = ^-^)-(Fi-F2) (1Q 6) 

"1 "2 



Example 10.3 

independent groups, population standard deviations known: The mean lasting time of 2 com- 
peting floor waxes is to be compared. Twenty floors are randomly assigned to test each wax. Both 
populations have a normal distribution. The following table is the result. 



Wax 


Sample Mean Number of Months Floor Wax Last 


Population Standard Deviation 


1 


3 


0.33 


2 


2.9 


0.36 



Table 10.2 

Problem 

Does the data indicate that wax 1 is more effective than wax 2? Test at a 5% level of significance. 

Solution 

This is a test of two independent groups, two population means, population standard deviations 
known. 

Random Variable: X^ — X2 — difference in the mean number of months the competing floor waxes 
last. 

H : ]i\ < Ji2 

H a \]i x > ji 2 

The words "is more effective" says that wax 1 lasts longer than wax 2, on the average. "Longer" 
is a " > " symbol and goes into H a . Therefore, this is a right-tailed test. 

Distribution for the test: The population standard deviations are known so the distribution is 
normal. Using the formula above, the distribution is: 



X7-X^N o /A /o# + «# 



20 ' 20 

Since }i\ < jij then \i\ — \ii < and the mean for the normal distribution is 0. 
Calculate the p-value using the normal distribution: p-value = 0.1799 
Graph: 
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p-value = 0.1799 




o o.i X| " X2 

From H o : u \ - jj, 2 ^ 

Figure 10.2 

x{ - xi = 3 - 2.9 = 0.1 

Compare ot and the p-value: a = 0.05 and p-value = 0.1799. Therefore, a. < p-value. 

Make a decision: Since a. < p-value, do not reject H . 

Conclusion: At the 5% level of significance, from the sample data, there is not sufficient evidence 
to conclude that the mean time wax 1 lasts is longer (wax 1 is more effective) than the mean time 
wax 2 lasts. 

NOTE: TI-83+ and TI-84: Press STAT. Arrow over to TESTS and press 3 : 2-SampZTest. Arrow over 
to Stats and press ENTER. Arrow down and enter .33 for sigmal, .36 for sigma2, 3 for the first 
sample mean, 20 for nl, 2 . 9 for the second sample mean, and 20 for n2. Arrow down to jil: and 
arrow to > }i2. Press ENTER. Arrow down to Calculate and press ENTER. The p-value is p = 0.1799 
and the test statistic is 0.9157. Do the procedure again but instead of Calculate do Draw. 



10.4 Comparing Two Independent Population Proportions 4 

1. The two independent samples are simple random samples that are independent. 

2. The number of successes is at least five and the number of failures is at least five for each of the 
samples. 

Comparing two proportions, like comparing two means, is common. If two estimated proportions are 
different, it may be due to a difference in the populations or it may be due to chance. A hypothesis test can 
help determine if a difference in the estimated proportions (P^ — Pg ) reflects a difference in the population 
proportions. 

The difference of two proportions follows an approximate normal distribution. Generally, the null hypoth- 
esis states that the two proportions are the same. That is, H : p& = pg. To conduct the test, we use a pooled 
proportion, p c . 



4 This content is available online at <http://cnx.Org/content/ml7043/l.12/>. 
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The pooled proportion is calculated as follows: 



xa + xb 
n A +n B 



(10.7) 



The distribution for the differences is: 



P'a-P't 



N 



0,, 


/*•<!- 


-Pc)- 


(- + -) 



(10.8) 



The test statistic (z-score) is: 



z = 



)Jpe-Q-Pc)-(& + &) 



(10.9) 



Example 10.4: Two population proportions 

Two types of medication for hives are being tested to determine if there is a difference in the 
proportions of adult patient reactions. Twenty out of a random sample of 200 adults given med- 
ication A still had hives 30 minutes after taking the medication. Twelve out of another random 
sample of 200 adults given medication B still had hives 30 minutes after taking the medication. 
Test at a 1% level of significance. 

10.4.1 Determining the solution 



(Solution on p. 450.) 



This is a test of 2 population proportions. 

Problem 

How do you know? 

Let A and B be the subscripts for medication A and medication B. Then p A and pg are the desired 
population proportions. 

Random Variable: 

P'a — P'b = difference in the proportions of adult patients who did not react after 30 minutes to 
medication A and medication B. 

H :Pa = Pb Pa~Pb = 

Ha-PA^PB PA~Pb 7^ 

The words "is a difference" tell you the test is two-tailed. 

Distribution for the test: Since this is a test of two binomial population proportions, the distribu- 
tion is normal: 

„ _ *A+*B _ 20+12 _ n no 1 _ „ _ f) QO 

P c — n A +n B — 200+200 — U - US l P c — U ' V/ 

Therefore, P'a — P'b "- 



N 



0,J(0.08H0.92).(JL + JL) 



P'a — P'b follows an approximate normal distribution. 

Calculate the p-value using the normal distribution: p-value = 0.1404. 
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Estimated proportion for group A: p' A = jp- = ^j = 0.1 
Estimated proportion for group B: p' B = jf- = %k = 0.06 
Graph: 



- (p-value) = 0.0702 1 , . . ntvJtY1 
2 - (p-value) = 0.0702 




From H , p A - p B = 0. 
Figure 10.3 



P' A -P' B = 0.1-0.06 = 0.04. 

Half the p-value is below -0.04 and half is above 0.04. 

Compare a and the p-value: a. = 0.01 and the p-value = 0.1404. a. < p-value. 

Make a decision: Since a. < p-value, do not reject H . 

Conclusion: At a 1% level of significance, from the sample data, there is not sufficient evidence to 
conclude that there is a difference in the proportions of adult patients who did not react after 30 
minutes to medication A and medication B. 

NOTE: TI-83+ and TI-84: Press STAT. Arrow over to TESTS and press 6 : 2-PropZTest. Arrow down 
and enter 20 for xl, 200 for nl, 12 for x2, and 200 for nl. Arrow down to pi: and arrow to not 
equal p2. Press ENTER. Arrow down to Calculate and press ENTER. The p-value is p = 0.1404 
and the test statistic is 1.47. Do the procedure again but instead of Calculate do Draw. 



10.5 Matched or Paired Samples 5 

1. Simple random sampling is used. 

2. Sample sizes are often small. 

3. Two measurements (samples) are drawn from the same pair of individuals or objects. 

4. Differences are calculated from the matched or paired samples. 

5. The differences form the sample that is used for the hypothesis test. 



5 This content is available online at <http://cnx.Org/content/ml7033/l.15/>. 
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6. The matched pairs have differences that either come from a population that is normal or the number of 
differences is sufficiently large so the distribution of the sample mean of differences is approximately 
normal. 

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are cal- 
culated. The differences are the data. The population mean for the differences, ji^, is then tested using 
a Student-t test for a single population mean with n — 1 degrees of freedom where n is the number of 
differences. 

The test statistic (t-score) is: 






(10.10) 



Example 10.5: Matched or paired samples 

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results 
for randomly selected subjects are shown in the table. The "before" value is matched to an "after" 
value and the differences are calculated. The differences have a normal distribution. 



Subject: 


A 


B 


C 


D 


E 


F 


G 


H 


Before 


6.6 


6.5 


9.0 


10.3 


11.3 


8.1 


6.3 


11.6 


After 


6.8 


2.4 


7.4 


8.5 


8.1 


6.1 


3.4 


2.0 



Table 10.3 

Problem 

Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level. 

Solution 

Corresponding "before" and "after" values form matched pairs. (Calculate "sfter" - "before"). 



After Data 


Before Data 


Difference 


6.8 


6.6 


0.2 


2.4 


6.5 


-4.1 


7.4 


9 


-1.6 


8.5 


10.3 


-1.8 


8.1 


11.3 


-3.2 


6.1 


8.1 


-2 


3.4 


6.3 


-2.9 


2 


11.6 


-9.6 



Table 10.4 

The data for the test are the differences: {0.2, -4.1, -1.6, -1.8, -3.2, -2, -2.9, -9.6} 

The sample mean and sample standard deviation of the differences are: 1Q = —3.13 and 

s^ = 2.91 Verify these values. 

Let \i& be the population mean for the differences. We use the subscript d to denote "differences." 



423 

Random Variable: X d = the mean difference of the sensory measurements 

H : }i d > (10.11) 

There is no improvement, (ji d is the population mean of the differences.) 

H a :}i d <0 (10.12) 

There is improvement. The score should be lower after hypnotism so the difference ought to be 
negative to indicate improvement. 

Distribution for the test: The distribution is a student-t with df — n — 1=8 — 1 — 7. Use /> 
(Notice that the test is for a single population mean.) 

Calculate the p-value using the Student-t distribution: p-value = 0.0095 

Graph: 



p-value = 0.0095 




-3.13 

From H , p-d > 

Figure 10.4 

X d is the random variable for the differences. 

The sample mean and sample standard deviation of the differences are: 

x d = -3.13 

s d = 2.91 

Compare ot and the p-value: oc = 0.05 and p-value = 0.0095. a. > p-value. 

Make a decision: Since a. > p-value, reject H . 

This means that \i d < and there is improvement. 

Conclusion: At a 5% level of significance, from the sample data, there is sufficient evidence to con- 
clude that the sensory measurements, on average, are lower after hypnotism. Hypnotism appears 
to be effective in reducing pain. 

NOTE: For the TT83+ and TT84 calculators, you can either calculate the differences ahead of time 
(after - before) and put the differences into a list or you can put the after data into a first list and 
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the before data into a second list. Then go to a third list and arrow up to the name. Enter 1st list 
name - 2nd list name. The calculator will do the subtraction and you will have the differences in 
the third list. 

NOTE: TI-83+ and TI-84: Use your list of differences as the data. Press STAT and arrow over to 
TESTS. Press 2 :T-Test. Arrow over to Data and press ENTER. Arrow down and enter for }Iq, the 
name of the list where you put the data, and 1 for Freq:. Arrow down to }i: and arrow over to < 
Ho- Press ENTER. Arrow down to Calculate and press ENTER. The p-value is 0.0094 and the test 
statistic is -3.04. Do these instructions again except arrow to Draw (instead of Calculate). Press 
ENTER. 



Example 10.6 

A college football coach was interested in whether the college's strength development class in- 
creased his players' maximum lift (in pounds) on the bench press exercise. He asked 4 of his 
players to participate in a study. The amount of weight they could each lift was recorded before 
they took the strength development class. After completing the class, the amount of weight they 
could each lift was again measured. The data are as follows: 



Weight (in pounds) 


Player 1 


Player 2 


Player 3 


Player 4 


Amount of weighted lifted prior to the class 


205 


241 


338 


368 


Amount of weight lifted after the class 


295 


252 


330 


360 



Table 10.5 

The coach wants to know if the strength development class makes his players stronger, on 
average. 

Problem (Solution on p. 450.) 

Record the differences data. Calculate the differences by subtracting the amount of weight lifted 
prior to the class from the weight lifted after completing the class. The data for the differences are: 
{90, 11, -8, -8}. The differences have a normal distribution. 

Using the differences data, calculate the sample mean and the sample standard deviation. 

x d = 21.3 s d = 46.7 

Using the difference data, this becomes a test of a single (fill in the blank). 

Define the random variable: X d = mean difference in the maximum lift per player. 

The distribution for the hypothesis test is £3. 

H : ]i d < H a :^>0 

Graph: 
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p-value = 0.2150 




Xd 



Figure 10.5 



Calculate the p-value: The p-value is 0.2150 

Decision: If the level of significance is 5%, the decision is to not reject the null hypothesis because 
a. < p-value. 

What is the conclusion? 

Example 10.7 

Seven eighth graders at Kennedy Middle School measured how far they could push the shot-put 
with their dominant (writing) hand and their weaker (non-writing) hand. They thought that they 
could push equal distances with either hand. The following data was collected. 



Distance 

(in feet) 

using 


Student 1 


Student 2 


Student 3 


Student 4 


Student 5 


Student 6 


Student 7 


Dominant 
Hand 


30 


26 


34 


17 


19 


26 


20 


Weaker 
Hand 


28 


14 


27 


18 


17 


26 


16 



Table 10.6 

Problem (Solution on p. 450.) 

Conduct a hypothesis test to determine whether the mean difference in distances between the 
children's dominant versus weaker hands is significant. 

HINT: use a t-test on the difference data. Assume the differences have a normal distribution. The 
random variable is the mean difference. 



CHECK: The test statistic is 2.18 and the p-value is 0.0716. 
What is your conclusion? 
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10.6 Summary of Types of Hypothesis Tests 6 

Two Population Means 

• Populations are independent and population standard deviations are unknown. 

• Populations are independent and population standard deviations are known (not likely). 

Matched or Paired Samples 

• Two samples are drawn from the same set of objects. 

• Samples are dependent. 

Two Population Proportions 

• Populations are independent. 



6 This content is available online at <http://cnx.org/content/ml7044/1.5/>. 
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10.7 Practice 1: Hypothesis Testing for Two Proportions 7 

10.7.1 Student Learning Outcomes 

• The student will conduct a hypothesis test of two proportions. 



10.7.2 Given 

In the recent Census, 3 percent of the U.S. population reported being two or more 
races. However, the percent varies tremendously from state to state. (Source: 

http://www.census.gov/prod/cen2010/briefs/c2010br-02.pdf) Suppose that two random surveys 
are conducted. In the first random survey, out of 1000 North Dakotans, only 9 people reported being of 
two or more races. In the second random survey, out of 500 Nevadans, 17 people reported being of two 
or more races. Conduct a hypothesis test to determine if the population percents are the same for the two 
states or if the percent for Nevada is statistically higher than for North Dakota. 

10.7.3 Hypothesis Testing: Two Proportions 

Exercise 10.7.1 (Solution on p. 450.) 

Is this a test of means or proportions? 

Exercise 10.7.2 (Solution on p. 450.) 

State the null and alternative hypotheses. 

a. H : 

b. H a : 

Exercise 10.7.3 (Solution on p. 450.) 

Is this a right-tailed, left-tailed, or two-tailed test? How do you know? 

Exercise 10.7.4 

What is the Random Variable of interest for this test? 

Exercise 10.7.5 

In words, define the Random Variable for this test. 

Exercise 10.7.6 (Solution on p. 450.) 

Which distribution (Normal or student' s-t) would you use for this hypothesis test? 

Exercise 10.7.7 

Explain why you chose the distribution you did for the above question. 

Exercise 10.7.8 (Solution on p. 450.) 

Calculate the test statistic. 

Exercise 10.7.9 

Sketch a graph of the situation. Mark the hypothesized difference and the sample difference. 
Shade the area corresponding to the p— value. 



7 This content is available online at <http://cnx.Org/content/ml7027/l.13/>. 
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f*N " ND 



Figure 10.6 



Exercise 10.7.10 (Solution on p. 450.) 

Find the p— value: 

Exercise 10.7.11 (Solution on p. 450.) 

At a pre-conceived a = 0.05, what is your: 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 



10.7.4 Discussion Question 

Exercise 10.7.12 

Does it appear that the proportion of Nevadans who are two or more races is higher than the 
proportion of North Dakotans? Why or why not? 
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10.8 Practice 2: Hypothesis Testing for Two Averages 8 

10.8.1 Student Learning Outcome 

• The student will conduct a hypothesis test of two means. 

10.8.2 Given 

The U.S. Center for Disease Control reports that the mean life expectancy for whites born in 1900 was 
47.6 years and for nonwhites it was 33.0 years, (http://www.cdc.gov/nchs/data/dvs/nvsr53_06tl2.pdf ) 
Suppose that you randomly survey death records for people born in 1900 in a certain county. Of the 124 
whites, the mean life span was 45.3 years with a standard deviation of 12.7 years. Of the 82 nonwhites, the 
mean life span was 34.1 years with a standard deviation of 15.6 years. Conduct a hypothesis test to see if 
the mean life spans in the county were the same for whites and nonwhites. 

10.8.3 Hypothesis Testing: Two Means 

Exercise 10.8.1 (Solution on p. 451.) 

Is this a test of means or proportions? 

Exercise 10.8.2 (Solution on p. 451.) 

State the null and alternative hypotheses. 

a. H : 

b. H a : 

Exercise 10.8.3 (Solution on p. 451.) 

Is this a right-tailed, left-tailed, or two-tailed test? How do you know? 

Exercise 10.8.4 (Solution on p. 451.) 

What is the Random Variable of interest for this test? 

Exercise 10.8.5 (Solution on p. 451.) 

In words, define the Random Variable of interest for this test. 

Exercise 10.8.6 

Which distribution (Normal or student' s-t) would you use for this hypothesis test? 

Exercise 10.8.7 

Explain why you chose the distribution you did for the above question. 

Exercise 10.8.8 (Solution on p. 451.) 

Calculate the test statistic. 

Exercise 10.8.9 

Sketch a graph of the situation. Label the horizontal axis. Mark the hypothesized difference and 
the sample difference. Shade the area corresponding to the p— value. 



8 This content is available online at <http://cnx.Org/content/ml7039/l.12/>. 
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Figure 10.7 



Exercise 10.8.10 (Solution on p. 451.) 

Find the p— value: 

Exercise 10.8.11 (Solution on p. 451.) 

At a pre-conceived a = 0.05, what is your: 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 



10.8.4 Discussion Question 

Exercise 10.8.12 

Does it appear that the means are the same? Why or why not? 
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10.9 Homework 9 

For questions Exercise 10.9.1 - Exercise 10.9.10, indicate which of the following choices best identifies the 
hypothesis test. 

A. Independent group means, population standard deviations and/or variances known 

B. Independent group means, population standard deviations and /or variances unknown 

C. Matched or paired samples 

D. Single mean 

E. 2 proportions 

F. Single proportion 

Exercise 10.9.1 (Solution on p. 451.) 

A powder diet is tested on 49 people and a liquid diet is tested on 36 different people. The pop- 
ulation standard deviations are 2 pounds and 3 pounds, respectively. Of interest is whether the 
liquid diet yields a higher mean weight loss than the powder diet. 

Exercise 10.9.2 

A new chocolate bar is taste-tested on consumers. Of interest is whether the proportion of children 
that like the new chocolate bar is greater than the proportion of adults that like it. 

Exercise 10.9.3 (Solution on p. 451.) 

The mean number of English courses taken in a two-year time period by male and female college 
students is believed to be about the same. An experiment is conducted and data are collected from 
9 males and 16 females. 

Exercise 10.9.4 

A football league reported that the mean number of touchdowns per game was 5. A study is done 
to determine if the mean number of touchdowns has decreased. 

Exercise 10.9.5 (Solution on p. 451.) 

A study is done to determine if students in the California state university system take longer to 
graduate than students enrolled in private universities. 100 students from both the California state 
university system and private universities are surveyed. From years of research, it is known that 
the population standard deviations are 1.5811 years and 1 year, respectively. 

Exercise 10.9.6 

According to a YWCA Rape Crisis Center newsletter, 75% of rape victims know their attackers. A 
study is done to verify this. 

Exercise 10.9.7 (Solution on p. 451.) 

According to a recent study, U.S. companies have an mean maternity-leave of six weeks. 

Exercise 10.9.8 

A recent drug survey showed an increase in use of drugs and alcohol among local high school 
students as compared to the national percent. Suppose that a survey of 100 local youths and 100 
national youths is conducted to see if the proportion of drug and alcohol use is higher locally than 
nationally. 

Exercise 10.9.9 (Solution on p. 451.) 

A new SAT study course is tested on 12 individuals. Pre-course and post-course scores are 
recorded. Of interest is the mean increase in SAT scores. 

Exercise 10.9.10 

University of Michigan researchers reported in the Journal of the National Cancer Institute that 
quitting smoking is especially beneficial for those under age 49. In this American Cancer Society 



9 This content is available online at <http://cnx.Org/content/ml7023/l.21/>. 
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study, the risk (probability) of dying of lung cancer was about the same as for those who had never 
smoked. 



10.9.1 



DIRECTIONS: For each of the word problems, use a solution sheet to do the hypothesis test. The 
solution sheet is found in 14. Appendix (online book version: the link is "Solution Sheets"; PDF 
book version: look under 14.5 Solution Sheets). Please feel free to make copies of the solution 
sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files. 

NOTE: If you are using a student's-t distribution for a homework problem below, including for 
paired data, you may assume that the underlying population is normally distributed. (In general, 
you must first prove that assumption, though.) 

Exercise 10.9.11 (Solution on p. 451.) 

A powder diet is tested on 49 people and a liquid diet is tested on 36 different people. Of interest 
is whether the liquid diet yields a higher mean weight loss than the powder diet. The powder diet 
group had an mean weight loss of 42 pounds with a standard deviation of 12 pounds. The liquid 
diet group had an mean weight loss of 45 pounds with a standard deviation of 14 pounds. 

Exercise 10.9.12 

The mean number of English courses taken in a two-year time period by male and female college 
students is believed to be about the same. An experiment is conducted and data are collected from 
29 males and 16 females. The males took an average of 3 English courses with a standard deviation 
of 0.8. The females took an average of 4 English courses with a standard deviation of 1.0. Are the 
means statistically the same? 

Exercise 10.9.13 (Solution on p. 451.) 

A study is done to determine if students in the California state university system take longer 
to graduate, on average, than students enrolled in private universities. 100 students from both 
the California state university system and private universities are surveyed. Suppose that from 
years of research, it is known that the population standard deviations are 1.5811 years and 1 year, 
respectively. The following data are collected. The California state university system students 
took on average 4.5 years with a standard deviation of 0.8. The private university students took 
on average 4.1 years with a standard deviation of 0.3. 

Exercise 10.9.14 

A new SAT study course is tested on 12 individuals. Pre-course and post-course scores are 
recorded. Of interest is the mean increase in SAT scores. The following data are collected: 
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Pre-course score 


Post-course score 


1200 


1300 


960 


920 


1010 


1100 


840 


880 


1100 


1070 


1250 


1320 


860 


860 


1330 


1370 


790 


770 


990 


1040 


1110 


1200 


740 


850 



Table 10.7 



Exercise 10.9.15 (Solution on p. 451.) 

A recent drug survey showed an increase in use of drugs and alcohol among local high school 
seniors as compared to the national percent. Suppose that a survey of 100 local seniors and 100 
national seniors is conducted to see if the proportion of drug and alcohol use is higher locally than 
nationally. Locally, 65 seniors reported using drugs or alcohol within the past month, while 60 
national seniors reported using them. 

Exercise 10.9.16 

A student at a four-year college claims that mean enrollment at four-year colleges is higher than 
at two-year colleges in the United States. Two surveys are conducted. Of the 35 two-year colleges 
surveyed, the mean enrollment was 5068 with a standard deviation of 4777. Of the 35 four-year 
colleges surveyed, the mean enrollment was 5466 with a standard deviation of 8191. (Source: 
Microsoft Bookshelf) 

Exercise 10.9.17 (Solution on p. 451.) 

A study was conducted by the U.S. Army to see if applying antiperspirant to soldiers' feet for a 
few days before a major hike would help cut down on the number of blisters soldiers had on their 
feet. In the experiment, for three nights before they went on a 13-mile hike, a group of 328 West 
Point cadets put an alcohol-based antiperspirant on their feet. A "control group" of 339 soldiers 
put on a similar, but inactive, preparation on their feet. On the day of the hike, the temperature 
reached 83 ° F. At the end of the hike, 21% of the soldiers who had used the antiperspirant and 48% 
of the control group had developed foot blisters. Conduct a hypothesis test to see if the proportion 
of soldiers using the antiperspirant was significantly lower than the control group. (Source: U.S. 
Army study reported in Journal of the American Academy of Dermatologists) 

Exercise 10.9.18 

We are interested in whether the proportions of female suicide victims for ages 15 to 24 are the 
same for the white and the black races in the United States. We randomly pick one year, 1992, 
to compare the races. The number of suicides estimated in the United States in 1992 for white 
females is 4930. 580 were aged 15 to 24. The estimate for black females is 330. 40 were aged 15 to 
24. We will let female suicide victims be our population. (Source: the National Center for Health 
Statistics, U.S. Dept. of Health and Human Services) 
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Exercise 10.9.19 (Solution on p. 452.) 

At Rachel's 11th birthday party, 8 girls were timed to see how long (in seconds) they could hold 
their breath in a relaxed position. After a two-minute rest, they timed themselves while jumping. 
The girls thought that the mean difference between their jumping and relaxed times would be 0. 
Test their hypothesis. 



Relaxed time (seconds) 


Jumping time (seconds) 


26 


21 


47 


40 


30 


28 


22 


21 


23 


25 


45 


43 


37 


35 


29 


32 



Table 10.8 



Exercise 10.9.20 

Elizabeth Mjelde, an art history professor, was interested in whether the value from the Golden 

Ratio formula, I ar S g ^ s "^.^ n ir "^* S10n jwas the same in the Whitney Exhibit for works from 1900 
- 1919 as for works from 1920 - 1942. 37 early works were sampled. They averaged 1.74 with 
a standard deviation of 0.11. 65 of the later works were sampled. They averaged 1.746 with a 
standard deviation of 0.1064. Do you think that there is a significant difference in the Golden 
Ratio calculation? (Source: data from Whitney Exhibit on loan to San Jose Museum of Art) 

Exercise 10.9.21 (Solution on p. 452.) 

One of the questions in a study of marital satisfaction of dual-career couples was to rate the state- 
ment, "I'm pleased with the way we divide the responsibilities for childcare." The ratings went 
from 1 (strongly agree) to 5 (strongly disagree). Below are ten of the paired responses for hus- 
bands and wives. Conduct a hypothesis test to see if the mean difference in the husband's versus 
the wife's satisfaction level is negative (meaning that, within the partnership, the husband is hap- 
pier than the wife). 



Wife's score 


2 


2 


3 


3 


4 


2 


1 


1 


2 


4 


Husband's score 


2 


2 


1 


3 


2 


1 


1 


1 


2 


4 



Table 10.9 



Exercise 10.9.22 

Ten individuals went on a low-fat diet for 12 weeks to lower their cholesterol. Evaluate the data 
below. Do you think that their cholesterol levels were significantly lowered? 
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Starting cholesterol level 


Ending cholesterol level 


140 


140 


220 


230 


110 


120 


240 


220 


200 


190 


180 


150 


190 


200 


360 


300 


280 


300 


260 


240 



Table 10.10 

Exercise 10.9.23 (Solution on p. 452.) 

Mean entry level salaries for college graduates with mechanical engineering degrees and elec- 
trical engineering degrees are believed to be approximately the same. (Source: http:// 
www.graduatingengineer.com w ). A recruiting office thinks that the mean mechanical engineer- 
ing salary is actually lower than the mean electrical engineering salary. The recruiting office ran- 
domly surveys 50 entry level mechanical engineers and 60 entry level electrical engineers. Their 
mean salaries were $46,100 and $46,700, respectively. Their standard deviations were $3450 and 
$4210, respectively. Conduct a hypothesis test to determine if you agree that the mean entry level 
mechanical engineering salary is lower than the mean entry level electrical engineering salary. 

Exercise 10.9.24 

A recent year was randomly picked from 1985 to the present. In that year, there were 2051 Hispanic 
students at Cabrillo College out of a total of 12,328 students. At Lake Tahoe College, there were 
321 Hispanic students out of a total of 2441 students. In general, do you think that the percent 
of Hispanic students at the two colleges is basically the same or different? (Source: Chancellor's 
Office, California Community Colleges, November 1994) 

Exercise 10.9.25 (Solution on p. 452.) 

Eight runners were convinced that the mean difference in their individual times for running one 
mile versus race walking one mile was at most 2 minutes. Below are their times. Do you agree 
that the mean difference is at most 2 minutes? 



"http://www.graduatingengineer.com/ 
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Running time (minutes) 


Race walking time (minutes) 


5.1 


7.3 


5.6 


9.2 


6.2 


10.4 


4.8 


6.9 


7.1 


8.9 


4.2 


9.5 


6.1 


9.4 


4.4 


7.9 



Table 10.11 



Exercise 10.9.26 

Marketing companies have collected data implying that teenage girls use more ring tones on their 
cellular phones than teenage boys do. In one particular study of 40 randomly chosen teenage girls 
and boys (20 of each) with cellular phones, the mean number of ring tones for the girls was 3.2 
with a standard deviation of 1.5. The mean for the boys was 1.7 with a standard deviation of 0.8. 
Conduct a hypothesis test to determine if the means are approximately the same or if the girls' 
mean is higher than the boys' mean. 

Exercise 10.9.27 (Solution on p. 452.) 

While her husband spent 2Vi hours picking out new speakers, a statistician decided to determine 
whether the percent of men who enjoy shopping for electronic equipment is higher than the per- 
cent of women who enjoy shopping for electronic equipment. The population was Saturday af- 
ternoon shoppers. Out of 67 men, 24 said they enjoyed the activity. 8 of the 24 women surveyed 
claimed to enjoy the activity. Interpret the results of the survey. 

Exercise 10.9.28 

We are interested in whether children's educational computer software costs less, on average, 
than children's entertainment software. 36 educational software titles were randomly picked from 
a catalog. The mean cost was $31.14 with a standard deviation of $4.69. 35 entertainment software 
titles were randomly picked from the same catalog. The mean cost was $33.86 with a standard 
deviation of $10.87. Decide whether children's educational software costs less, on average, than 
children's entertainment software. (Source: Educational Resources, December catalog) 

Exercise 10.9.29 (Solution on p. 452.) 

Parents of teenage boys often complain that auto insurance costs more, on average, for teenage 
boys than for teenage girls. A group of concerned parents examines a random sample of insurance 
bills. The mean annual cost for 36 teenage boys was $679. For 23 teenage girls, it was $559. From 
past years, it is known that the population standard deviation for each group is $180. Determine 
whether or not you believe that the mean cost for auto insurance for teenage boys is greater than 
that for teenage girls. 

Exercise 10.9.30 

A group of transfer bound students wondered if they will spend the same mean amount on texts 
and supplies each year at their four-year university as they have at their community college. They 
conducted a random survey of 54 students at their community college and 66 students at their 
local four-year university. The sample means were $947 and $1011, respectively. The population 
standard deviations are known to be $254 and $87, respectively. Conduct a hypothesis test to 
determine if the means are statistically the same. 
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Exercise 10.9.31 (Solution on p. 452.) 

Joan Nguyen recently claimed that the proportion of college-age males with at least one pierced 
ear is as high as the proportion of college-age females. She conducted a survey in her classes. Out 
of 107 males, 20 had at least one pierced ear. Out of 92 females, 47 had at least one pierced ear. Do 
you believe that the proportion of males has reached the proportion of females? 

Exercise 10.9.32 

Some manufacturers claim that non-hybrid sedan cars have a lower mean miles per gallon (mpg) 
than hybrid ones. Suppose that consumers test 21 hybrid sedans and get a mean of 31 mpg with a 
standard deviation of 7 mpg. Thirty-one non-hybrid sedans get a mean of 22 mpg with a standard 
deviation of 4 mpg. Suppose that the population standard deviations are known to be 6 and 3, 
respectively. Conduct a hypothesis test to the manufacturers claim. 

Questions Exercise 10.9.33 - Exercise 10.9.37 refer to the Terri Vogel's data set (see Table of Contents). 

Exercise 10.9.33 (Solution on p. 452.) 

Using the data from Lap 1 only, conduct a hypothesis test to determine if the mean time for com- 
pleting a lap in races is the same as it is in practices. 

Exercise 10.9.34 

Repeat the test in Exercise 10.9.33, but use Lap 5 data this time. 

Exercise 10.9.35 (Solution on p. 453.) 

Repeat the test in Exercise 10.9.33, but this time combine the data from Laps 1 and 5. 

Exercise 10.9.36 

In 2 - 3 complete sentences, explain in detail how you might use Terri Vogel's data to answer the 
following question. "Does Terri Vogel drive faster in races than she does in practices?" 
Exercise 10.9.37 (Solution on p. 453.) 

Is the proportion of race laps Terri completes slower than 130 seconds less than the proportion of 
practice laps she completes slower than 135 seconds? 

Exercise 10.9.38 

"To Breakfast or Not to Breakfast?" by Richard Ayore 

In the American society, birthdays are one of those days that everyone looks forward to. People of 
different ages and peer groups gather to mark the 18th, 20th, . . . birthdays. During this time, one 
looks back to see what he or she had achieved for the past year, and also focuses ahead for more 
to come. 

If, by any chance, I am invited to one of these parties, my experience is always different. Instead 
of dancing around with my friends while the music is booming, I get carried away by memories 
of my family back home in Kenya. I remember the good times I had with my brothers and sister 
while we did our daily routine. 

Every morning, I remember we went to the shamba (garden) to weed our crops. I remember one 
day arguing with my brother as to why he always remained behind just to join us an hour later. In 
his defense, he said that he preferred waiting for breakfast before he came to weed. He said, "This 
is why I always work more hours than you guys!" 

And so, to prove his wrong or right, we decided to give it a try. One day we went to work as usual 
without breakfast, and recorded the time we could work before getting tired and stopping. On 
the next day, we all ate breakfast before going to work. We recorded how long we worked again 
before getting tired and stopping. Of interest was our mean increase in work time. Though not 
sure, my brother insisted that it is more than two hours. Using the data below, solve our problem. 
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Work hours with breakfast 


Work hours without breakfast 


8 


6 


7 


5 


9 


5 


5 


4 


9 


7 


8 


7 


10 


7 


7 


5 


6 


6 


9 


5 



Table 10.12 



10.9.2 Try these multiple choice questions. 

For questions Exercise 10.9.39 - Exercise 10.9.40, use the following information. 

A new AIDS prevention drugs was tried on a group of 224 HIV positive patients. Forty-five (45) patients 
developed AIDS after four years. In a control group of 224 HIV positive patients, 68 developed AIDS after 
four years. We want to test whether the method of treatment reduces the proportion of patients that develop 
AIDS after four years or if the proportions of the treated group and the untreated group stay the same. 



Let the subscript t= treated patient and ut= untreated patient. 

Exercise 10.9.39 

The appropriate hypotheses are: 

A. H :p t < p ut and H a : p t > p ut 

B. H : p t < put and H a : p t > p ut 

C. H : p t = put and H a : p t / p u t 

D. H : p t = put and H a : p t < p u t 

Exercise 10.9.40 

If the p -value is 0.0062 what is the conclusion (use a. = 0.05 )? 



(Solution on p. 453.) 



(Solution on p. 453.) 



A. The method has no effect. 

B. There is sufficient evidence to conclude that the method reduces the proportion of HIV positive 

patients that develop AIDS after four years. 

C. There is sufficient evidence to conclude that the method increases the proportion of HIV posi- 

tive patients that develop AIDS after four years. 

D. There is insufficient evidence to conclude that the method reduces the proportion of HIV pos- 

itive patients that develop AIDS after four years. 

Exercise 10.9.41 (Solution on p. 453.) 

Lesley E. Tan investigated the relationship between left-handedness and right-handedness and 

motor competence in preschool children. Random samples of 41 left-handers and 41 right-handers 



439 



were given several tests of motor skills to determine if there is evidence of a difference between the 
children based on this experiment. The experiment produced the means and standard deviations 
shown below. Determine the appropriate test and best distribution to use for that test. 





Left-handed 


Right-handed 


Sample size 


41 


41 


Sample mean 


97.5 


98.1 


Sample standard deviation 


17.5 


19.2 



Table 10.13 

A. Two independent means, normal distribution 

B. Two independent means, student's-t distribution 

C. Matched or paired samples, student's-t distribution 

D. Two population proportions, normal distribution 

For questions Exercise 10.9.42 - Exercise 10.9.43, use the following information. 

An experiment is conducted to show that blood pressure can be consciously reduced in people trained in a 
"biofeedback exercise program." Six (6) subjects were randomly selected and the blood pressure measure- 
ments were recorded before and after the training. The difference between blood pressures was calculated 
(after — before) producing the following results: x^ = —10.2 sj = 8.4. Using the data, test the hypothesis 
that the blood pressure has decreased after the training, 

Exercise 10.9.42 (Solution on p. 453.) 

The distribution for the test is 



A. t 5 

B. h 

C. N (-10.2,8.4) 

D.N (-10.2, §f) 



Exercise 10.9.43 

If a. — 0.05, the p-value and the conclusion are 



(Solution on p. 453.) 



A. 0.0014; There is sufficient evidence to conclude that the blood pressure decreased after the 

training 

B. 0.0014; There is sufficient evidence to conclude that the blood pressure increased after the train- 

ing 

C. 0.0155; There is sufficient evidence to conclude that the blood pressure decreased after the 

training 

D. 0.0155; There is sufficient evidence to conclude that the blood pressure increased after the 

training 

For questions Exercise 10.9.44- Exercise 10.9.45, use the following information. 

The Eastern and Western Major League Soccer conferences have a new Reserve Division that allows new 
players to develop their skills. Data for a randomly picked date showed the following annual goals. 
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Western 


Eastern 


Los Angeles 9 


D.C. United 9 


FC Dallas 3 


Chicago 8 


Chivas USA 4 


Columbus 7 


Real Salt Lake 3 


New England 6 


Colorado 4 


MetroStars 5 


San Jose 4 


Kansas City 3 



Table 10.14 

Conduct a hypothesis test to determine if the Western Reserve Division teams score, on average, fewer goals 
than the Eastern Reserve Division teams. Subscripts: 1 Western Reserve Division (W); 2 Eastern Reserve 
Division (E) 

Exercise 10.9.44 (Solution on p. 453.) 

The exact distribution for the hypothesis test is: 

A. The normal distribution. 

B. The student' s-t distribution. 

C. The uniform distribution. 

D. The exponential distribution. 



Exercise 10.9.45 

If the level of significance is 0.05, the conclusion is: 



(Solution on p. 453.) 



A. There is sufficient evidence to conclude that the W Division teams score, on average, fewer 

goals than the E teams. 

B. There is insufficient evidence to conclude that the W Division teams score, on average, more 

goals than the E teams. 

C. There is insufficient evidence to conclude that the W teams score, on average, fewer goals than 

the E teams score. 

D. Unable to determine. 



Questions Exercise 10.9.46 - Exercise 10.9.48 refer to the following. 

Neuroinvasive West Nile virus refers to a severe disease that affects a person's nervous system . It 
is spread by the Culex species of mosquito. In the United States in 2010 there were 629 reported 
cases of neuroinvasive West Nile virus out of a total of 1021 reported cases and there were 486 neu- 
roinvasive reported cases out of a total of 712 cases reported in 2011. Is the 2011 proportion of 
neuroinvasive West Nile virus cases more than the 2010 proportion of neuroinvasive West Nile virus 
cases? Using a 1% level of significance, conduct an appropriate hypothesis test. (Source: http:// 
http://www.cdc.gov/ncidod/dvbid/westnile/index.htm n ) 

• "2011" subscript: 2011 group. 

• "2010" subscript: 2010 group 



Exercise 10.9.46 

This is: 

A. a test of two proportions 

1 http:/ 7cnx.org/content/ml7023/latest/ http://www.cdc.gov/ncidod/dvbid/westnile/index.htm 



(Solution on p. 453.) 
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B. a test of two independent means 

C. a test of a single mean 

D. a test of matched pairs. 

Exercise 10.9.47 

An appropriate null hypothesis is: 

A. P2011 < P2010 

B. P2011 > P2010 
C ^2011 < F2010 
D. P2011 > P2010 



(Solution on p. 453.) 



Exercise 10.9.48 (Solution on p. 453.) 

The p-value is 0.0022. At a 1% level of significance, the appropriate conclusion is 

A. There is sufficient evidence to conclude that the proportion of people in the United States in 

2011 that got neuroinvasive West Nile disease is less than the proportion of people in the 
United States in 2010 that got neuroinvasive West Nile disease. 

B. There is insufficient evidence to conclude that the proportion of people in the United States in 

2011 that got neuroinvasive West Nile disease is more than the proportion of people in the 
United States in 2010 that got neuroinvasive West Nile disease. 

C. There is insufficient evidence to conclude that the proportion of people in the United States 

in 2011 that got neuroinvasive West Nile disease is less than the proportion of people in the 
United States in 2010 that got neuroinvasive West Nile disease. 

D. There is sufficient evidence to conclude that the proportion of people in the United States in 

2011 that got neuroinvasive West Nile disease is more than the proportion of people in the 
United States in 2010 that got neuroinvasive West Nile disease. 

Questions Exercise 10.9.49 and Exercise 10.9.50 refer to the following: 

A golf instructor is interested in determining if her new technique for improving players' golf scores is 
effective. She takes four (4) new students. She records their 18-holes scores before learning the technique 
and then after having taken her class. She conducts a hypothesis test. The data are as follows. 





Player 1 


Player 2 


Player 3 


Player 4 


Mean score before class 


83 


78 


93 


87 


Mean score after class 


80 


80 


86 


86 



Table 10.15 



Exercise 10.9.49 

This is: 

A. a test of two independent means 

B. a test of two proportions 

C. a test of a single proportion 

D. a test of matched pairs. 

Exercise 10.9.50 

The correct decision is: 



(Solution on p. 453.) 



(Solution on p. 453.) 



A. Reject H 
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B. Do not reject H 

Questions Exercise 10.9.51 and Exercise 10.9.52 refer to the following: 

Suppose a statistics instructor believes that there is no significant difference between the mean class scores 
of statistics day students on Exam 2 and statistics night students on Exam 2. She takes random samples 
from each of the populations. The mean and standard deviation for 35 statistics day students were 75.86 
and 16.91. The mean and standard deviation for 37 statistics night students were 75.41 and 19.73. The "day" 
subscript refers to the statistics day students. The "night" subscript refers to the statistics night students. 

Exercise 10.9.51 (Solution on p. 453.) 

An appropriate alternate hypothesis for the hypothesis test is: 

A- Fday > Fnight 

B - F day < Fnight 

*-• Fday = Fnight 

"' Fday r Fnight 

Exercise 10.9.52 (Solution on p. 453.) 

A concluding statement is: 

A. There is sufficient evidence to conclude that statistics night students mean on Exam 2 is better 

than the statistics day students mean on Exam 2. 

B. There is insufficient evidence to conclude that the statistics day students mean on Exam 2 is 

better than the statistics night students mean on Exam 2. 

C. There is insufficient evidence to conclude that there is a significant difference between the 

means of the statistics day students and night students on Exam 2. 

D. There is sufficient evidence to conclude that there is a significant difference between the means 

of the statistics day students and night students on Exam 2. 
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10.10 Review 12 

The next three questions refer to the following information: 
In a survey at Kirkwood Ski Resort the following information was recorded: 

Sport Participation by Age 





0-10 


11-20 


21-40 


40+ 


Ski 


10 


12 


30 


8 


Snowboard 


6 


17 


12 


5 



(Solution on p. 453.) 



Table 10.16 

Suppose that one person from of the above was randomly selected. 

Exercise 10.10.1 

Find the probability that the person was a skier or was age 11-20. 

Exercise 10.10.2 (Solution on p. 453.) 

Find the probability that the person was a snowboarder given he/she was age 21 - 40. 

Exercise 10.10.3 (Solution on p. 453.) 

Explain which of the following are true and which are false. 

a. Sport and Age are independent events. 

b. Ski and age 11 - 20 are mutually exclusive events. 

c. P (Ski and age 21 - 40) < P (Ski | age 21 - 40) 

d. P (Snowboard or age — 10) < P (Snowboard | age — 10) 

Exercise 10.10.4 (Solution on p. 454.) 

The average length of time a person with a broken leg wears a cast is approximately 6 weeks. 
The standard deviation is about 3 weeks. Thirty people who had recently healed from broken 
legs were interviewed. State the distribution that most accurately reflects total time to heal for the 
thirty people. 

Exercise 10.10.5 (Solution on p. 454.) 

The distribution for X is Uniform. What can we say for certain about the distribution for X when 
n = 1? 

A. The distribution for X is still Uniform with the same mean and standard dev. as the distribution 

for X. _ 

B. The distribution for Xis Normal with the different mean and a different standard deviation as 

the distribution for X. 

C. The distribution for X is Normal with the same mean but a larger standard deviation than the 

distribution for X. 

D. The distribution for X is Normal with the same mean but a smaller standard deviation than 

the distribution for X. 



Exercise 10.10.6 (Solution on p. 454.) 

The distribution for X is uniform. What can we say for certain about the distribution for 7J X 
when n = 50? 



2 This content is available online at <http://cnx.org/content/ml7021/1.9/>. 
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A. The distribution for TJXis still uniform with the same mean and standard deviation as the 

distribution for X. 

B. The distribution for yj X is Normal with the same mean but a larger standard deviation as the 

distribution for X. 

C. The distribution for yj X is Normal with a larger mean and a larger standard deviation than the 

distribution for X. 

D. The distribution for yj X is Normal with the same mean but a smaller standard deviation than 

the distribution for X. 

The next three questions refer to the following information: 

A group of students measured the lengths of all the carrots in a five-pound bag of baby carrots. They 
calculated the average length of baby carrots to be 2.0 inches with a standard deviation of 0.25 inches. 
Suppose we randomly survey 16 five-pound bags of baby carrots. 

Exercise 10.10.7 (Solution on p. 454.) 

State the approximate distribution for X, the distribution for the average lengths of baby carrots 
in 16 five-pound bags. X~ 

Exercise 10.10.8 

Explain why we cannot find the probability that one individual randomly chosen carrot is greater 
than 2.25 inches. 

Exercise 10.10.9 (Solution on p. 454.) 

Find the probability that x is between 2 and 2.25 inches. 

The next three questions refer to the following information: 

At the beginning of the term, the amount of time a student waits in line at the campus store is normally 
distributed with a mean of 5 minutes and a standard deviation of 2 minutes. 

Exercise 10.10.10 (Solution on p. 454.) 

Find the 90th percentile of waiting time in minutes. 

Exercise 10.10.11 (Solution on p. 454.) 

Find the median waiting time for one student. 

Exercise 10.10.12 (Solution on p. 454.) 

Find the probability that the average waiting time for 40 students is at least 4.5 minutes. 
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10.11 Lab: Hypothesis Testing for Two Means and Two Proportions 

Class Time: 
Names: 

10.11.1 Student Learning Outcomes: 

• The student will select the appropriate distributions to use in each case. 

• The student will conduct hypothesis tests and interpret the results. 



10.11.2 Supplies: 

• The business section from two consecutive days' newspapers 

• 3 small packages of M&Ms® 

• 5 small packages of Reese's Pieces® 



10.11.3 Increasing Stocks Survey 

Look at yesterday's newspaper business section. Conduct a hypothesis test to determine if the proportion 
of New York Stock Exchange (NYSE) stocks that increased is greater than the proportion of NASDAQ stocks 
that increased. As randomly as possible, choose 40 NYSE stocks and 32 NASDAQ stocks and complete the 
following statements. 

1. H 

2. H a 

3. In words, define the Random Variable. = 

4. The distribution to use for the test is: 



5. Calculate the test statistic using your data. 

6. Draw a graph and label it appropriately. Shade the actual level of significance. 

a. Graph: 



3 This content is available online at <http://cnx.Org/content/ml7022/l.13/>. 
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Figure 10.8 



b. Calculate the p-value: 

7. Do you reject or not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 



10.11.4 Decreasing Stocks Survey 

Randomly pick 8 stocks from the newspaper. Using two consecutive days' business sections, test whether 
the stocks went down, on average, for the second day. 

1. H 

2. H a 

3. In words, define the Random Variable. = 

4. The distribution to use for the test is: 

5. Calculate the test statistic using your data. 

6. Draw a graph and label it appropriately. Shade the actual level of significance. 

a. Graph: 
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Figure 10.9 



b. Calculate the p-value: 

7. Do you reject or not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 



10.11.5 Candy Survey 

Buy three small packages of M&Ms and 5 small packages of Reese's Pieces (same net weight as the M&Ms). 
Test whether or not the mean number of candy pieces per package is the same for the two brands. 

1. H : 

2. H a : 

3. In words, define the random variable. = 

4. What distribution should be used for this test? 

5. Calculate the test statistic using your data. 

6. Draw a graph and label it appropriately. Shade the actual level of significance. 

a. Graph: 
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Figure 10.10 



b. Calculate the p-value: 

7. Do you reject or not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 



10.11.6 Shoe Survey 

Test whether women have, on average, more pairs of shoes than men. Include all forms of sneakers, shoes, 
sandals, and boots. Use your class as the sample. 

1. H 

2. H a 

3. In words, define the Random Variable. = 

4. The distribution to use for the test is: 

5. Calculate the test statistic using your data. 

6. Draw a graph and label it appropriately. Shade the actual level of significance. 

a. Graph: 
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Figure 10.11 



b. Calculate the p-value: 

7. Do you reject or not reject the null hypothesis? Why? 

8. Write a clear conclusion using a complete sentence. 
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Solutions to Exercises in Chapter 10 

Solution to Example 10.2, Problem 1 (p. 417) 

two means 

Solution to Example 10.2, Problem 2 (p. 417) 

unknown 

Solution to Example 10.2, Problem 3 (p. 417) 

student's-t 

Solution to Example 10.2, Problem 4 (p. 417) 

Xa — Xb 
Solution to Example 10.2, Problem 5 (p. 417) 

• H : fi A <}i B 

• H a :ji A >}i B 

Solution to Example 10.2, Problem 6 (p. 417) 

right 

Solution to Example 10.2, Problem 7 (p. 417) 

0.1928 

Solution to Example 10.2, Problem 8 (p. 417) 

Do not reject. 

Solution to Example 10.4, Problem (p. 420) 

The problem asks for a difference in proportions. 
Solution to Example 10.6, Problem (p. 424) 

means; At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that 
the strength development class helped to make the players stronger, on average. 
Solution to Example 10.7, Problem (p. 425) 

Hq: fid equals 0; H a : \i& does not equal 0; Do not reject the null; At a 5% significance level, from the 
sample data, there is not sufficient evidence to conclude that the mean difference in distances between the 
children's dominant versus weaker hands is significant (there is not sufficient evidence to show that the 
children could push the shot-put further with their dominant hand). Alpha and the p-value are close so the 
test is not strong. 

Solutions to Practice 1: Hypothesis Testing for Two Proportions 

Solution to Exercise 10.7.1 (p. 427) 

Proportions 

Solution to Exercise 10.7.2 (p. 427) 

a. Ho:pn=pnd 
a - H fl : PN > PND 

Solution to Exercise 10.7.3 (p. 427) 

right-tailed 

Solution to Exercise 10.7.6 (p. 427) 

Normal 

Solution to Exercise 10.7.8 (p. 427) 

3.50 

Solution to Exercise 10.7.10 (p. 428) 

0.0002 

Solution to Exercise 10.7.11 (p. 428) 

a. Reject the null hypothesis 
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Solutions to Practice 2: Hypothesis Testing for Two Averages 

Solution to Exercise 10.8.1 (p. 429) 
Means 
Solution to Exercise 10.8.2 (p. 429) 

a. H : n w = /i NW 

b. H a : n w ^ fi NW 

Solution to Exercise 10.8.3 (p. 429) 
two-tailed 

Solution to Exercise 10.8.4 (p. 429) 
X w — X N w 

Solution to Exercise 10.8.5 (p. 429) 

The difference between the mean life spans of whites and nonwhites. 
Solution to Exercise 10.8.8 (p. 429) 
5.42 

Solution to Exercise 10.8.10 (p. 430) 
0.0000 
Solution to Exercise 10.8.11 (p. 430) 

a. Reject the null hypothesis 

Solutions to Homework 

Solution to Exercise 10.9.1 (p. 431) 

A 

Solution to Exercise 10.9.3 (p. 431) 

B 

Solution to Exercise 10.9.5 (p. 431) 

A 

Solution to Exercise 10.9.7 (p. 431) 

D 

Solution to Exercise 10.9.9 (p. 431) 

C 

Solution to Exercise 10.9.11 (p. 432) 

d- ^68.44 

e. -1.04 

f. 0.1519 

h. Decision: Do not reject null 

Solution to Exercise 10.9.13 (p. 432) 

Standard Normal 

e. z = 2.14 

f. 0.0163 

h. Decision: Reject null when a = 0.05; Do not reject null when a = 0.01 

Solution to Exercise 10.9.15 (p. 433) 

e. 0.73 

f. 0.2326 

h. Decision: Do not reject null 

Solution to Exercise 10.9.17 (p. 433) 
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e. -7.33 

f. 

h. Decision: Reject null 

Solution to Exercise 10.9.19 (p. 434) 

d. t 7 

e. -1.51 

f. 0.1755 

h. Decision: Do not reject null 

Solution to Exercise 10.9.21 (p. 434) 

d. t 9 

e. t = -1.86 

f. 0.0479 

h. Decision: Reject null, but run another test 

Solution to Exercise 10.9.23 (p. 435) 

d. ^108 

e. t = -0.82 

f. 0.2066 

h. Decision: Do not reject null 

Solution to Exercise 10.9.25 (p. 435) 

d. t 7 

e. t = 2.9850 

f. 0.0102 

h. Decision: Reject null; There is sufficient evidence to conclude that the mean difference is more than 2 
minutes. 

Solution to Exercise 10.9.27 (p. 436) 

e. 0.22 

f. 0.4133 

h. Decision: Do not reject null 

Solution to Exercise 10.9.29 (p. 436) 

e. z = 2.50 

f. 0.0063 

h. Decision: Reject null 

Solution to Exercise 10.9.31 (p. 437) 

e. -4.82 

f. 

h. Decision: Reject null 

Solution to Exercise 10.9.33 (p. 437) 

d- ^20.32 

e. -4.70 

f. 0.0001 

h. Decision: Reject null 
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Solution to Exercise 10.9.35 (p. 437) 

d. £40.94 

e. -5.08 

f. 

h. Decision: Reject null 

Solution to Exercise 10.9.37 (p. 437) 

e. -0.9223 

f. 0.1782 

h. Decision: Do not reject null 

Solution to Exercise 10.9.39 (p. 438) 
D 

Solution to Exercise 10.9.40 (p. 438) 
B 

Solution to Exercise 10.9.41 (p. 438) 
B 

Solution to Exercise 10.9.42 (p. 439) 
A 

Solution to Exercise 10.9.43 (p. 439) 
C 

Solution to Exercise 10.9.44 (p. 440) 
B 

Solution to Exercise 10.9.45 (p. 440) 
C 

Solution to Exercise 10.9.46 (p. 440) 
A 

Solution to Exercise 10.9.47 (p. 441) 
A 

Solution to Exercise 10.9.48 (p. 441) 
D 

Solution to Exercise 10.9.49 (p. 441) 
D 

Solution to Exercise 10.9.50 (p. 441) 
B 

Solution to Exercise 10.9.51 (p. 442) 
D 

Solution to Exercise 10.9.52 (p. 442) 
C 

Solutions to Review 

Solution to Exercise 10.10.1 (p. 443) 

77 

100 
Solution to Exercise 10.10.2 (p. 443) 

12 
42 

Solution to Exercise 10.10.3 (p. 443) 

a. False 

b. False 

c. True 
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d. False 

Solution to Exercise 10.10.4 (p. 443) 
N (180, 16.43) 

Solution to Exercise 10.10.5 (p. 443) 
A 

Solution to Exercise 10.10.6 (p. 443) 
C 
Solution to Exercise 10.10.7 (p. 444) 

Solution to Exercise 10.10.9 (p. 444) 
0.5000 

Solution to Exercise 10.10.10 (p. 444) 
7.6 

Solution to Exercise 10.10.11 (p. 444) 
5 

Solution to Exercise 10.10.12 (p. 444) 
0.9431 



Chapter 11 

The Chi-Square Distribution 

11.1 The Chi-Square Distribution 1 
11.1.1 Student Learning Outcomes 

By the end of this chapter, the student should be able to: 

• Interpret the chi-square probability distribution as the sample size changes. 

• Conduct and interpret chi-square goodness-of-fit hypothesis tests. 

• Conduct and interpret chi-square test of independence hypothesis tests. 

• Conduct and interpret chi-square homogeneity hypothesis tests. 

• Conduct and interpret chi-square single variance hypothesis tests. 



11.1.2 Introduction 

Have you ever wondered if lottery numbers were evenly distributed or if some numbers occurred with a 
greater frequency? How about if the types of movies people preferred were different across different age 
groups? What about if a coffee machine was dispensing approximately the same amount of coffee each 
time? You could answer these questions by conducting a hypothesis test. 

You will now study a new distribution, one that is used to determine the answers to the above examples. 
This distribution is called the Chi-square distribution. 

In this chapter, you will learn the three major applications of the Chi-square distribution: 

• The goodness-of-fit test, which determines if data fit a particular distribution, such as with the lottery 
example 

• The test of independence, which determines if events are independent, such as with the movie exam- 
ple 

• The test of a single variance, which tests variability, such as with the coffee example 

NOTE: Though the Chi-square calculations depend on calculators or computers for most of the 
calculations, there is a table available (see the Table of Contents 15. Tables). TI-83+ and TI-84 
calculator instructions are included in the text. 



1 This content is available online at <http://cnx.org/content/ml7048/1.9/>. 
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11.1.3 Optional Collaborative Classroom Activity 

Look in the sports section of a newspaper or on the Internet for some sports data (baseball averages, bas- 
ketball scores, golf tournament scores, football odds, swimming times, etc.). Plot a histogram and a boxplot 
using your data. See if you can determine a probability distribution that your data fits. Have a discussion 
with the class about your choice. 



2 



11.2 Notation 

The notation for the chi-square distribution is: 

2 2 

where df = degrees of freedom depend on how chi-square is being used. (If you want to practice calculat- 
ing chi-square probabilities then use df — n — 1. The degrees of freedom for the three major uses are each 
calculated differently.) 

For the x 2 distribution, the population mean is ]i = df and the population standard deviation is u = 

V^df- 

The random variable is shown as x 2 but may be any upper case letter. 

The random variable for a chi-square distribution with k degrees of freedom is the sum of k independent, 
squared standard normal variables. 

x 2 = (z 1 ) 2 + (z 2 ) 2 + ... + (z,) 2 

11.3 Facts About the Chi-Square Distribution 3 

1. The curve is nonsymmetrical and skewed to the right. 

2. There is a different chi-square curve for each df. 

2 This content is available online at <http://cnx.Org/content/ml7052/l.6/>. 
3 This content is available online at <http://cnx.org/content/ml7045/1.6/>. 
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df = 2 



(a) 




Figure 11.1 



3. The test statistic for any test is always greater than or equal to zero. 

4. When df > 90, the chi-square curve approximates the normal. For X 
and the standard deviation, a = \^2 ■ 1000 = 44.7. Therefore, X ■ 

5. The mean, }i, is located just to the right of the peak. 



■ ;cfooo the mean, y. = df — 1000 
N (1000,44.7), approximately. 




Figure 11.2 



In the next sections, you will learn about four different applications of the Chi-Square Distribution. These 
hypothesis tests are almost always right-tailed tests. In order to understand why the tests are mostly right- 
tailed, you will need to look carefully at the actual definition of the test statistic. Think about the following 
while you study the next four sections. If the expected and observed values are "far" apart, then the test 
statistic will be "large" and we will reject in the right tail. The only way to obtain a test statistic very close to 
zero, would be if the observed and expected values are very, very close to each other. A left-tailed test could 
be used to determine if the fit were "too good." A "too good" fit might occur if data had been manipulated 
or invented. Think about the implications of right-tailed versus left-tailed hypothesis tests as you learn the 
applications of the Chi-Square Distribution. 
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11.4 Goodness-of-Fit Test 4 

In this type of hypothesis test, you determine whether the data "fit" a particular distribution or not. For 
example, you may suspect your unknown data fit a binomial distribution. You use a chi-square test (mean- 
ing the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null 
and the alternate hypotheses for this test may be written in sentences or may be stated as equations or 
inequalities. 



The test statistic for a goodness-of-fit test is: 



where: 



(O 



(11.1) 



• O = observed values (data) 

• E = expected values (from theory) 

• k = the number of different data cells or categories 



The observed values are the data values and the expected values are the values you would expect to get 



if the null hypothesis were true. There are n terms of the form — r 

The degrees of freedom are df = (number of categories - 1). 

The goodness-of-fit test is almost always right tailed. If the observed values and the corresponding ex- 
pected values are not close to each other, then the test statistic can get very large and will be way out in the 
right tail of the chi-square curve. 

NOTE: The expected value for each cell needs to be at least 5 in order to use this test. 

Example 11.1 

Absenteeism of college students from math classes is a major concern to math instructors because 
missing class appears to increase the drop rate. Suppose that a study was done to determine if the 
actual student absenteeism follows faculty perception. The faculty expected that a group of 100 
students would miss class according to the following chart. 



Number absences per term 


Expected number of students 


0-2 


50 


3-5 


30 


6-8 


12 


9-11 


6 


12+ 


2 



Table 11.1 

A random survey across all mathematics courses was then done to determine the actual number 
(observed) of absences in a course. The next chart displays the result of that survey. 



4 This content is available online at <http://cnx.org/content/ml7192/!. 8/>. 
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Number absences per term 


Actual number of students 


0-2 


35 


3-5 


40 


6-8 


20 


9-11 


1 


12+ 


4 



Table 11.2 
Determine the null and alternate hypotheses needed to conduct a goodness-of-fit test. 
H : Student absenteeism fits faculty perception. 
The alternate hypothesis is the opposite of the null hypothesis. 

H a : Student absenteeism does not fit faculty perception. 

Problem 1 

Can you use the information as it appears in the charts to conduct the goodness-of-fit test? 

Solution 

No. Notice that the expected number of absences for the "12+" entry is less than 5 (it is 2). 
Combine that group with the "9 - 11" group to create new tables where the number of students for 
each entry are at least 5. The new tables are below. 



Number absences per term 


Expected number of students 


0-2 


50 


3-5 


30 


6-8 


12 


9+ 


8 



Table 11.3 



Number absences per term 


Actual number of students 


0-2 


35 


3-5 


40 


6-8 


20 


9+ 


5 



Table 11.4 



Problem 2 

What are the degrees of freedom (df)? 
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Solution 

There are 4 "cells" or categories in each of the new tables. 

df = number of cells — 1=4 — 1=3 



Example 11.2 

Employers particularly want to know which days of the week employees are absent in a five 
day work week. Most employers would like to believe that employees are absent equally dur- 
ing the week. Suppose a random sample of 60 managers were asked on which day of the week 
did they have the highest number of employee absences. The results were distributed as fol- 
lows: 

Day of the Week Employees were most Absent 





Monday 


Tuesday 


Wednesday 


Thursday 


Friday 


Number of Absences 


15 


12 


9 


9 


15 



Table 11.5 

Problem 

For the population of employees, do the days for the highest number of absences occur with equal 
frequencies during a five day work week? Test at a 5% significance level. 

Solution 

The null and alternate hypotheses are: 

• H : The absent days occur with equal frequencies, that is, they fit a uniform distribution. 

• H a : The absent days occur with unequal frequencies, that is, they do not fit a uniform distri- 
bution. 

If the absent days occur with equal frequencies, then, out of 60 absent days (the total in the sample: 
15 + 12 + 9 + 9 + 15 = 60), there would be 12 absences on Monday, 12 on Tuesday, 12 on Wednesday, 
12 on Thursday, and 12 on Friday. These numbers are the expected (E) values. The values in the 
table are the observed (O) values or data. 

This time, calculate the x 2 test statistic by hand. Make a chart with the following headings and fill 
in the columns: 

• Expected (E) values (12, 12, 12, 12, 12) 

• Observed (O) values (15, 12, 9, 9, 15) 

• (O ~ E) 

• (O - E) 2 

(O - Ef 



(O 



-) should have 0.75, 0, 0.75, 0.75, 0.75. 



The last column ( 

Now add (sum) the last column. Verify that the sum is 3. This is the x 2 test statistic. 



To find the p-value, calculate P (x 2 > 3) ■ This test is right-tailed. 

(Use a computer or calculator to find the p-value. You should get p-value 



0.5578.) 
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The dfs are the number of cells — 1=5 — 1=4. 

TI-83+ and TI-84: Press 2nd DISTR. Arrow down to ^ 2 cdf. Press ENTER. Enter (3,10-99,4). 
Rounded to 4 decimal places, you should see 0.5578 which is the p-value. 

Next, complete a graph like the one below with the proper labeling and shading. (You should 
shade the right tail.) 




X 2 



The decision is to not reject the null hypothesis. 

Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to 
conclude that the absent days do not occur with equal frequencies. 

NOTE: TI-83+ and some TI-84 calculators do not have a special program for the test statistic for the 
goodness-of-fit test. The next example (Example 11-3) has the calculator instructions. The newer 
TI-84 calculators have in STAT TESTS the test Chi2 GQF. To run the test, put the observed values 
(the data) into a first list and the expected values (the values you expect if the null hypothesis is 
true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names for the Observed list 
and the Expected list. Enter the degrees of freedom and press calculate or draw. Make sure you 
clear any lists before you start. See below. 

NOTE: To Clear Lists in the calculators: Go into STAT EDIT and arrow up to the list name area of 
the particular list. Press CLEAR and then arrow down. The list will be cleared. Or, you can press 
STAT and press 4 (for ClrList). Enter the list name and press ENTER. 



Example 11.3 

One study indicates that the number of televisions that American families have is distributed (this 
is the given distribution for the American population) as follows: 



Number of Televisions 


Percent 





10 


1 


16 


2 


55 


3 


11 


over 3 


8 



Table 11.6 
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The table contains expected (E) percents. 

A random sample of 600 families in the far western United States resulted in the following data: 



Number of Televisions 


Frequency 





66 


1 


119 


2 


340 


3 


60 


over 3 


15 




Total = 600 



Table 11.7 

The table contains observed (O) frequency values. 

Problem 

At the 1% significance level, does it appear that the distribution "number of televisions" of far 
western United States families is different from the distribution for the American population as a 
whole? 

Solution 

This problem asks you to test whether the far western United States families distribution fits the 
distribution of the American families. This test is always right-tailed. 

The first table contains expected percentages. To get expected (E) frequencies, multiply the per- 
centage by 600. The expected frequencies are: 



Number of Televisions 


Percent 


Expected Frequency 





10 


(0.10) • (600) = 60 


1 


16 


(0.16) • (600) = 96 


2 


55 


(0.55) • (600) = 330 


3 


11 


(0.11) • (600) = 66 


over 3 


8 


(0.08) • (600) = 48 



Table 11.8 

Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the 
calculator do the math. For example, instead of 60, enter .10*600. 

H : The "number of televisions" distribution of far western United States families is the same as 
the "number of televisions" distribution of the American population. 

H a : The "number of televisions" distribution of far western United States families is different from 
the "number of televisions" distribution of the American population. 

Distribution for the test: ]Q where df = (the number of cells) — 1=5 — 1=4. 

NOTE: df ^ 600 - 1 
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Calculate the test statistic: x 2 — 29.65 
Graph: 

p-value = 0.000006 (almost 0) 




X 

4 29.65 

Probability statement: p-value = P (x 2 > 29.65) = 0.000006. 
Compare a and the p-value: 

• a = 0.01 

• p-value = 0.000006 

So, a > p-value. 

Make a decision: Since a. > p-value, reject H . 

This means you reject the belief that the distribution for the far western states is the same as that 
of the American population as a whole. 

Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude 
that the "number of televisions" distribution for the far western United States is different from the 
"number of televisions" distribution for the American population as a whole. 

NOTE: TI-83+ and some TI-84 calculators: Press STAT and ENTER. Make sure to clear lists LI, 
L2, and L3 if they have data in them (see the note at the end of Example 11-2). Into LI, put 
the observed frequencies 66, 119, 349, 60, 15. Into L2, put the expected frequencies .10*600, 
. 16*600, . 55*600, . 11*600, . 08*600. Arrow over to list L3 and up to the name area "L3". Enter 
(Ll-L2)~2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You 
should see "sum" (Enter L3). Rounded to 2 decimal places, you should see 29.65. Press 2nd 
DISTR. Press 7 or Arrow down to 7:#2cdf and press ENTER. Enter (29.65, 1E99, 4). Rounded 
to 4 places, you should see 5 . 77E-6 = . 000006 (rounded to 6 decimal places) which is the p-value. 

The newer TI-84 calculators have in STAT TESTS the test Chi2 G0F. To run the test, put the 
observed values (the data) into a first list and the expected values (the values you expect if the 
null hypothesis is true) into a second list. Press STAT TESTS and Chi2 G0F. Enter the list names 
for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or 
draw. Make sure you clear any lists before you start. 



Example 11.4 

Suppose you flip two coins 100 times. The results are 20 HH, 27 HT, 30 TH, and 23 TT. Are the 
coins fair? Test at a 5% significance level. 
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Solution 

This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair 
coins is (HH, HT, TH, TT}. Out of 100 flips, you would expect 25 HH, 25 HT, 25 TH, and 25 TT 
This is the expected distribution. The question, "Are the coins fair?" is the same as saying, "Does 
the distribution of the coins (20 HH, 27 HT, 30 TH, 23 TT) fit the expected distribution?" 

Random Variable: Let X = the number of heads in one flip of the two coins. X takes on the value 
0, 1, 2. (There are 0, 1, or 2 heads in the flip of 2 coins.) Therefore, the number of cells is 3. Since 
X = the number of heads, the observed frequencies are 20 (for 2 heads), 57 (for 1 head), and 23 (for 
heads or both tails). The expected frequencies are 25 (for 2 heads), 50 (for 1 head), and 25 (for 
heads or both tails). This test is right-tailed. 

H : The coins are fair. 

H fl : The coins are not fair. 

Distribution for the test: pf| where df = 3 — 1 = 2. 

Calculate the test statistic: x 2 — 214 

Graph: 

p-value = 03430 




2.14 



X 2 



Probability statement: p-value = P (x 2 > 2.14) = 0.3430 
Compare oc and the p-value: 

• a = 0.05 

• p-value = 0.3430 

So, a < p-value. 

Make a decision: Since a. < p-value, do not reject H . 

Conclusion: There is insufficient evidence to conclude that the coins are not fair. 

NOTE: TI-83+ and some TI- 84 calculators: Press STAT and ENTER. Make sure you clear lists LI, L2, 
and L3 if they have data in them. Into LI, put the observed frequencies 20, 57, 23. Into L2, put 
the expected frequencies 25, 50, 25. Arrow over to list L3 and up to the name area "L3". Enter 
(Ll-L2)-2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You 
should see "sum" .Enter L3. Rounded to 2 decimal places, you should see 2 . 14. Press 2nd DISTR. 
Arrow down to 7 : ^2cdf (or press 7). Press ENTER. Enter 2.14, 1E99 , 2) . Rounded to 4 places, you 
should see . 3430 which is the p-value. 

The newer TI-84 calculators have in STAT TESTS the test Chi2 G0F. To run the test, put the 
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observed values (the data) into a first list and the expected values (the values you expect if the 
null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names 
for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or 
draw. Make sure you clear any lists before you start. 



11.5 Test of Independence 5 

Tests of independence involve using a contingency table of observed (data) values. You first saw a contin- 
gency table when you studied probability in the Probability Topics (Section 4.1) chapter. 

The test statistic for a test of independence is similar to that of a goodness-of-fit test: 



(«■■/) 



where: 

• O = observed values 

• E = expected values 

• i = the number of rows in the table 

• j = the number of columns in the table 



There are i ■ j terms of the form - — 



A test of independence determines whether two factors are independent or not. You first encountered 
the term independence in Chapter 3. As a review, consider the following example. 

NOTE: The expected value for each cell needs to be at least 5 in order to use this test. 

Example 11.5 

Suppose A = a speeding violation in the last year and B = a cell phone user while driving. If A and 
B are independent then P (A AND B) = P (A) P (B). A AND B is the event that a driver received 
a speeding violation last year and is also a cell phone user while driving. Suppose, in a study of 
drivers who received speeding violations in the last year and who uses cell phones while driving, 
that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 
were cell phone users while driving and 450 were not. 

Let y = expected number of drivers that use a cell phone while driving and received speeding 
violations. 

If A and B are independent, then P (A AND B) = P (A) P (B). By substitution, 

_J/_ . _ 7Q_ 305 
755 — 755 ' 755 

Solve for y : y = 7 -^ = 28.3 



5 This content is available online at <http://cnx.Org/content/ml7191/l.12/>. 
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About 28 people from the sample are expected to be cell phone users while driving and to receive 
speeding violations. 

In a test of independence, we state the null and alternate hypotheses in words. Since the con- 
tingency table consists of two factors, the null hypothesis states that the factors are independent 
and the alternate hypothesis states that they are not independent (dependent). If we do a test of 
independence using the example above, then the null hypothesis is: 

H : Being a cell phone user while driving and receiving a speeding violation are independent 
events. 

If the null hypothesis were true, we would expect about 28 people to be cell phone users while 
driving and to receive a speeding violation. 

The test of independence is always right-tailed because of the calculation of the test statistic. If 
the expected and observed values are not close together, then the test statistic is very large and 
way out in the right tail of the chi-square curve, like goodness-of-fit. 

The degrees of freedom for the test of independence are: 

df = (number of columns - 1) (number of rows - 1) 

The following formula calculates the expected number (E): 

p (row total) (column total) 

total number surveyed 

Example 11.6 

In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend 
time with a disabled senior citizen. The program recruits among community college students, 
four-year college students, and nonstudents. The following table is a sample of the adult volun- 
teers and the number of hours they volunteer per week. 

Number of Hours Worked Per Week by Volunteer Type (Observed) 



Type of Volunteer 


1-3 Hours 


4-6 Hours 


7-9 Hours 


Row Total 


Community College Students 


111 


96 


48 


255 


Four- Year College Students 


96 


133 


61 


290 


Nonstudents 


91 


150 


53 


294 


Column Total 


298 


379 


162 


839 



Table 11.9: The table contains observed (O) values (data). 

Problem 

Are the number of hours volunteered independent of the type of volunteer? 

Solution 

The observed table and the question at the end of the problem, "Are the number of hours vol- 
unteered independent of the type of volunteer?" tell you this is a test of independence. The two 
factors are number of hours volunteered and type of volunteer. This test is always right-tailed. 

H : The number of hours volunteered is independent of the type of volunteer. 

H a : The number of hours volunteered is dependent on the type of volunteer. 
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The expected table is: 

Number of Hours Worked Per Week by Volunteer Type (Expected) 



Type of Volunteer 


1-3 Hours 


4-6 Hours 


7-9 Hours 


Community College Students 


90.57 


115.19 


49.24 


Four- Year College Students 


103.00 


131.00 


56.00 


Nonstudents 


104.42 


132.81 


56.77 



Table 11.10: The table contains expected (E) values (data). 
For example, the calculation for the expected frequency for the top left cell is 



E = 



(row total) (column total) 



255-298 _ qn cry 
839 ~~ y v.3/ 



total number surveyed 

Calculate the test statistic: x 2 
Distribution for the test: x\ 

df = (3 columns - 1) (3 rows - 1) = (2) (2) = 4 
Graph: 

p-value = 0.0113 



12.99 (calculator or computer) 




X 2 



Probability statement: p-value = P (x 2 > 12.99) = 0.0113 

Compare a and the p-value: Since no a is given, assume a = 0.05. p-value = 0.0113. a. > p-value. 

Make a decision: Since a. > p-value, reject H . This means that the factors are not independent. 

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude 
that the number of hours volunteered and the type of volunteer are dependent on one another. 

For the above example, if there had been another type of volunteer, teenagers, what would the 
degrees of freedom be? 

NOTE: Calculator instructions follow. 

TI-83+ and TI-84 calculator: Press the MATRX key and arrow over to EDIT. Press 1 : [A] . Press 3 
ENTER 3 ENTER. Enter the table values by row from Example 11-6. Press ENTER after each. Press 
2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:^2-TEST. Press ENTER. You 
should see Observed: [A] and Expected: [B] . Arrow down to Calculate. Press ENTER. The test 
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statistic is 12.9909 and the p-value = 0.0113. Do the procedure a second time but arrow down to 
Draw instead of calculate. 



Example 11.7 

De Anza College is interested in the relationship between anxiety level and the need to succeed 
in school. A random sample of 400 students took a test that measured anxiety level and need to 
succeed in school. The table shows the results. De Anza College wants to know if anxiety level 
and need to succeed in school are independent events. 



Need to Succeed in School vs. Anxiety Level 



Need to 
Succeed in 
School 


High 
Anxiety 


Med-high 
Anxiety 


Medium 
Anxiety 


Med-low 
Anxiety 


Low 

Anxiety 


Row Total 


High Need 


35 


42 


53 


15 


10 


155 


Medium 

Need 


18 


48 


63 


33 


31 


193 


Low Need 


4 


5 


11 


15 


17 


52 


Column To- 
tal 


57 


95 


127 


63 


58 


400 



Table 11.11 

Problem 1 

How many high anxiety level students are expected to have a high need to succeed in school? 

Solution 

The column total for a high anxiety level is 57. The row total for high need to succeed in school is 
155. The sample size or total surveyed is 400. 

p (row total) (column total) 155-57 r.r. ^q 

C ~ total surveyed ~~ 400 ~~ AZ " W 

The expected number of students who have a high anxiety level and a high need to succeed in 
school is about 22. 



Problem 2 

If the two variables are independent, how many students do you expect to have a low need to 
succeed in school and a med-low level of anxiety? 

Solution 

The column total for a med-low anxiety level is 63. The row total for a low need to succeed in 
school is 52. The sample size or total surveyed is 400. 

Problem 3 

P (row total)(column total) _ 

total surveyed 

b. The expected number of students who have a med-low anxiety level and a low need to succeed 
in school is about: 
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11.6 Summary of Formulas 6 

The Chi-Square Probability Distribution 

]i = df and a = y/2- df 
Goodness-of-Fit Hypothesis Test 



Use goodness-of-fit to test whether a data set fits a particular probability distribution. 
The degrees of freedom are number of cells or categories - 1 . 



(O— £) 2 

• The test statistic is E i — ■p- L - , where O = observed values (data), E = expected values (from theory), 

k 
and k = the number of different data cells or categories. 

• The test is right-tailed. 

Test of Independence 

• Use the test of independence to test whether two factors are independent or not. 

• The degrees of freedom are equal to (number of columns - 1) (number of rows - 1). 

• The test statistic is E * — •£-*- where O = observed values, E = expected values, i = the number of rows 

('■;') 
in the table, and j = the number of columns in the table. 

• The test is right-tailed. 

• If the null hypothesis is true, the expected number E = W total surveyed ° & ' 

Test of Homogeneity 

• Use the test for homogeneity to decide if two populations with unknown distributions have the same 
distribution as each other. 

• The degrees of freedom are equal to number of columns - 1 . 

(O—E) 2 

• The test statistic is E * — •£-*- where O = observed values, E = expected values, i = the number of rows 

(»'■;') 

in the table, and j = the number of columns in the table. 

• The test is right-tailed. 

. If the null hypothesis is true, the expected number E = (row S^'g^ total) • 

NOTE: The expected value for each cell needs to be at least 5 in order to use the Goodness-of-Fit, 
Independence and Homogeneity tests. 

Test of a Single Variance 

• Use the test to determine variation. 
The degrees of freedom are the number of samples - 1 . 

The test statistic is — — j^- , where n = the total number of data, s 2 = sample variance, and a 2 = 

population variance. 

The test may be left, right, or two-tailed. 



6 This content is available online at <http://cnx.org/content/ml7058/1.8/>. 
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11.7 Practice 1: Goodness-of-Fit Test 7 

11.7.1 Student Learning Outcomes 

• The student will conduct a goodness-of-fit test. 

11.7.2 Given 

The following data are real. The cumulative number of AIDS cases reported for Santa Clara County is 
broken down by ethnicity as follows: (Source: HIV/ AIDS Epidemiology Santa Clara County, Santa Clara 
County Public Health Department, May 201 1 ) 



Ethnicity 


Number of Cases 


White 


2229 


Hispanic 


1157 


Black / African- American 


457 


Asian, Pacific Islander 


232 




Total = 4075 



Table 11.12 
The percentage of each ethnic group in Santa Clara County is as follows: 



Ethnicity 


Percentage of total county pop- 
ulation 


Number expected (round to 2 
decimal places) 


White 


42.9% 


1748.18 


Hispanic 


26.7% 




Black / African- American 


2.6% 




Asian, Pacific Islander 


27.8% 






Total = 100% 





Table 11.13 



11.7.3 Expected Results 

If the ethnicity of AIDS victims followed the ethnicity of the total county population, fill in the expected 
number of cases per ethnic group. 

11.7.4 Goodness-of-Fit Test 

Perform a goodness-of-fit test to determine whether the make-up of AIDS cases follows the ethnicity of the 
general population of Santa Clara County. 

Exercise 11.7.1 

H : 



7 This content is available online at <http://cnx.Org/content/ml7054/l.12/>. 
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Exercise 11.7.2 

H a : 

Exercise 11.7.3 

Is this a right-tailed, left-tailed, or two-tailed test? 

Exercise 11.7.4 (Solution on p. 495.) 

degrees of freedom = 

Exercise 11.7.5 (Solution on p. 495.) 

Chi test statistic = 

Exercise 11.7.6 (Solution on p. 495.) 

p-value = 

Exercise 11.7.7 

Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade 
in the region corresponding to the p-value. 



Let a = 0.05 

Decision: 

Reason for the Decision: 

Conclusion (write out in complete sentences): 

11.7.5 Discussion Question 

Exercise 11.7.8 

Does it appear that the pattern of AIDS cases in Santa Clara County corresponds to the distribu- 
tion of ethnic groups in this county? Why or why not? 
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11.8 Practice 2: Contingency Tables 8 

11.8.1 Student Learning Outcomes 

• The student will conduct a test for independence using contingency tables. 
Conduct a hypothesis test to determine if smoking level and ethnicity are independent. 

11.8.2 Collect the Data 

Copy the data provided in Probability Topics Practice 1: Calculating Probabilities into the table below. 

Smoking Levels by Ethnicity (Observed) 



Smoking 
Level Per 
Day 


African 
American 


Native 
Hawaiian 


Latino 


Japanese 
Americans 


White 


TOTALS 


1-10 














11-20 














21-30 














31+ 














TOTALS 















Table 11.14 



11.8.3 Hypothesis 

State the hypotheses. 
• H : 



11.8.4 Expected Values 

Enter expected values in the above below. Round to two decimal places. 

11.8.5 Analyze the Data 

Calculate the following values: 

Exercise 11.8.1 

Degrees of freedom = 

Exercise 11.8.2 

Chi test statistic = 

Exercise 11.8.3 

p-value = 

Exercise 11.8.4 

Is this a right-tailed, left-tailed, or two-tailed test? Explain why. 



(Solution on p. 495.) 



(Solution on p. 495.) 



(Solution on p. 495.) 



(Solution on p. 495.) 



8 This content is available online at <http://cnx.Org/content/ml7056/l.12/>. 
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11.8.6 Graph the Data 

Exercise 11.8.5 

Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade 
in the region corresponding to the p-value. 



11.8.7 Conclusions 

State the decision and conclusion (in a complete sentence) for the following preconceived levels of a. . 

Exercise 11.8.6 (Solution on p. 495.) 

a = 0.05 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 



Exercise 11.8.7 

a = 0.01 

a. Decision: 

b. Reason for the decision: 

c. Conclusion (write out in a complete sentence): 
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11.9 Homework 9 

Exercise 11.9.1 

a. Explain why the "goodness of fit" test and the "test for independence" are generally right tailed 

tests. 

b. If you did a left-tailed test, what would you be testing? 



11.9.1 Word Problems 

For each word problem, use a solution sheet to solve the hypothesis test problem. Go to The Table of 
Contents 14. Appendix for the chi-square solution sheet. Round expected frequency to two decimal places. 

Exercise 11.9.2 

A 6-sided die is rolled 120 times. Fill in the expected frequency column. Then, conduct a hypoth- 
esis test to determine if the die is fair. The data below are the result of the 120 rolls. 



Face Value 


Frequency 


Expected Frequency 


1 


15 




2 


29 




3 


16 




4 


15 




5 


30 




6 


15 





Table 11.15 

Exercise 11.9.3 (Solution on p. 495.) 

The marital status distribution of the U.S. male population, age 15 and older, is as shown below. 
(Source: U.S. Census Bureau, Current Population Reports) 



Marital Status 


Percent 


Expected Frequency 


never married 


31.3 




married 


56.1 




widowed 


2.5 




divorced /separated 


10.1 





Table 11.16 

Suppose that a random sample of 400 U.S. young adult males, 18 - 24 years old, yielded the 
following frequency distribution. We are interested in whether this age group of males fits the dis- 
tribution of the U.S. adult population. Calculate the frequency one would expect when surveying 
400 people. Fill in the above table, rounding to two decimal places. 



9 This content is available online at <http://cnx.Org/content/ml7028/l.20/>. 
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Marital Status 


Frequency 


never married 


140 


married 


238 


widowed 


2 


divorced / separated 


20 



Table 11.17 

The next two questions refer to the following information. The columns in the chart below contain the 
Race/Ethnicity of U.S. Public Schools for a recent year, the percentages for the Advanced Placement Exami- 
nee Population for that class and the Overall Student Population. (Source: http://www.collegeboard.com). 
Suppose the right column contains the result of a survey of 1000 local students from that year who took an 
AP Exam. 



Race/Ethnicity 


AP Examinee Popula- 
tion 


Overall Student Popu- 
lation 


Survey Frequency 


Asian, Asian American 
or Pacific Islander 


10.2% 


5.4% 


113 


Black or African Ameri- 
can 


8.2% 


14.5% 


94 


Hispanic or Latino 


15.5% 


15.9% 


136 


American Indian or 
Alaska Native 


0.6% 


1.2% 


10 


White 


59.4% 


61.6% 


604 


Not reported /other 


6.1% 


1.4% 


43 



Table 11.18 



Exercise 11.9.4 

Perform a goodness-of-fit test to determine whether the local results follow the distribution of the 
U. S. Overall Student Population based on ethnicity. 

Exercise 11.9.5 (Solution on p. 495.) 

Perform a goodness-of-fit test to determine whether the local results follow the distribution of U. 
S. AP Examinee Population, based on ethnicity. 

Exercise 11.9.6 

The City of South Lake Tahoe, CA, has an Asian population of 1419 people, out of a total popu- 
lation of 23,609 (Source: U.S. Census Bureau). Suppose that a survey of 1419 self -reported Asians 
in Manhattan, NY, area yielded the data in the table below. Conduct a goodness of fit test to de- 
termine if the self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe 
area. 
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Race 


Lake Tahoe Frequency 


Manhattan Frequency 


Asian Indian 


131 


174 




Chinese 


118 


557 




Filipino 


1045 


518 




Japanese 


80 


54 




Korean 


12 


29 




Vietnamese 


9 


21 




Other 


24 


66 





Table 11.19 

The next two questions refer to the following information: UCLA conducted a survey of more than 
263,000 college freshmen from 385 colleges in fall 2005. The results of student expected majors by gender 
were reported in The Chronicle of Higher Education (2/2/2006). Suppose a survey of 5000 graduating 
females and 5000 graduating males was done as a follow-up last year to determine what their actual major 
was. The results are shown in the tables for Exercises 7 and 8. The second column in each table does not 
add to 100% because of rounding. 

Exercise 11.9.7 (Solution on p. 495.) 

Conduct a hypothesis test to determine if the actual college major of graduating females fits the 
distribution of their expected majors. 



Major 


Women - Expected Major 


Women - Actual Major 


Arts & Humanities 


14.0% 


670 


Biological Sciences 


8.4% 


410 


Business 


13.1% 


685 


Education 


13.0% 


650 


Engineering 


2.6% 


145 


Physical Sciences 


2.6% 


125 


Professional 


18.9% 


975 


Social Sciences 


13.0% 


605 


Technical 


0.4% 


15 


Other 


5.8% 


300 


Undecided 


8.0% 


420 



Table 11.20 



Exercise 11.9.8 

Conduct a hypothesis test to determine if the actual college major of graduating males fits the 
distribution of their expected majors. 
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Major 


Men - Expected Major 


Men - Actual Major 


Arts & Humanities 


11.0% 


600 


Biological Sciences 


6.7% 


330 


Business 


22.7% 


1130 


Education 


5.8% 


305 


Engineering 


15.6% 


800 


Physical Sciences 


3.6% 


175 


Professional 


9.3% 


460 


Social Sciences 


7.6% 


370 


Technical 


1.8% 


90 


Other 


8.2% 


400 


Undecided 


6.6% 


340 



Table 11.21 

Exercise 11.9.9 (Solution on p. 495.) 

A recent debate about where in the United States skiers believe the skiing is best prompted the 
following survey. Test to see if the best ski area is independent of the level of the skier. 



U.S. Ski Area 


Beginner 


Intermediate 


Advanced 


Tahoe 


20 


30 


40 


Utah 


10 


30 


60 


Colorado 


10 


40 


50 



Table 11.22 

Exercise 11.9.10 

Car manufacturers are interested in whether there is a relationship between the size of car an 
individual drives and the number of people in the driver's family (that is, whether car size and 
family size are independent). To test this, suppose that 800 car owners were randomly surveyed 
with the following results. Conduct a test for independence. 



Family Size 


Sub & Compact 


Mid-size 


Full-size 


Van & Truck 


1 


20 


35 


40 


35 


2 


20 


50 


70 


80 


3-4 


20 


50 


100 


90 


5+ 


20 


30 


70 


70 



Table 11.23 



Exercise 11.9.11 (Solution on p. 496.) 

College students may be interested in whether or not their majors have any effect on starting 

salaries after graduation. Suppose that 300 recent graduates were surveyed as to their majors 
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in college and their starting salaries after graduation. Below are the data. Conduct a test for 
independence. 



Major 


< $50,000 


$50,000 - $68,999 


$69,000 + 


English 


5 


20 


5 


Engineering 


10 


30 


60 


Nursing 


10 


15 


15 


Business 


10 


20 


30 


Psychology 


20 


30 


20 



Table 11.24 

Exercise 11.9.12 

Some travel agents claim that honeymoon hot spots vary according to age of the bride and groom. 
Suppose that 280 East Coast recent brides were interviewed as to where they spent their honey- 
moons. The information is given below. Conduct a test for independence. 



Location 


20-29 


30-39 


40-49 


50 and over 


Niagara Falls 


15 


25 


25 


20 


Poconos 


15 


25 


25 


10 


Europe 


10 


25 


15 


5 


Virgin Islands 


20 


25 


15 


5 



Table 11.25 

Exercise 11.9.13 (Solution on p. 496.) 

A manager of a sports club keeps information concerning the main sport in which members 
participate and their ages. To test whether there is a relationship between the age of a member 
and his or her choice of sport, 643 members of the sports club are randomly selected. Conduct a 
test for independence. 



Sport 


18-25 


26-30 


31-40 


41 and over 


racquetball 


42 


58 


30 


46 


tennis 


58 


76 


38 


65 


swimming 


72 


60 


65 


33 



Table 11.26 



Exercise 11.9.14 

A major food manufacturer is concerned that the sales for its skinny French fries have been de- 
creasing. As a part of a feasibility study, the company conducts research into the types of fries sold 
across the country to determine if the type of fries sold is independent of the area of the country. 
The results of the study are below. Conduct a test for independence. 
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Type of Fries 


Northeast 


South 


Central 


West 


skinny fries 


70 


50 


20 


25 


curly fries 


100 


60 


15 


30 


steak fries 


20 


40 


10 


10 



Table 11.27 

Exercise 11.9.15 (Solution on p. 496.) 

According to Dan Lenard, an independent insurance agent in the Buffalo, N.Y. area, the following 
is a breakdown of the amount of life insurance purchased by males in the following age groups. 
He is interested in whether the age of the male and the amount of life insurance purchased are 
independent events. Conduct a test for independence. 



Age of Males 


None 


< $200,000 


$200,000 - $400,000 


$401,001 - $1,000,000 


$1,000,000 + 


20-29 


40 


15 


40 





5 


30-39 


35 


5 


20 


20 


10 


40-49 


20 





30 





30 


50 + 


40 


30 


15 


15 


10 



Table 11.28 

Exercise 11.9.16 

Suppose that 600 thirty-year-olds were surveyed to determine whether or not there is a relation- 
ship between the level of education an individual has and salary. Conduct a test for independence. 



Annual Salary 


Not a high school 
graduate 


High school grad- 
uate 


College graduate 


Masters or doctor- 
ate 


< $30,000 


15 


25 


10 


5 


$30,000 - $40,000 


20 


40 


70 


30 


$40,000 - $50,000 


10 


20 


40 


55 


$50,000 - $60,000 


5 


10 


20 


60 


$60,000 + 





5 


10 


150 



Table 11.29 

Exercise 11.9.17 (Solution on p. 496.) 

A Psychologist is interested in testing whether there is a difference in the distribution of personal- 
ity types for business majors and social science majors. The results of the study are shown below. 
Conduct a Test of Homogeneity. Test at a 5% level of significance. 





Open 


Conscientious 


Extrovert 


Agreeable Neurotic 




Business 


41 


52 


46 


61 58 




Social Science 


72 


75 


63 


80 65 
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Table 11.30 

Exercise 11.9.18 (Solution on p. 496.) 

Do men and women select different breakfasts? The breakfast ordered by randomly selected men 
and women at a popular breakfast place is shown below. Conduct a test of homogeneity. Test at a 
5% level of significance 





French Toast 


Pancakes 


Waffles 


Omelettes 


Men 


47 


35 


28 


53 


Women 


65 


59 


55 


60 



Table 11.31 



Exercise 11.9.19 (Solution on p. 496.) 

Is there a difference between the distribution of community college statistics students and the 
distribution of university statistics students in what technology they use on their homework? Of 
the randomly selected community college students 43 used a computer, 102 used a calculator 
with built in statistics functions, and 65 used a table from the textbook. Of the randomly selected 
university students 28 used a computer, 33 used a calculator with built in statistics functions, and 
40 used a table from the textbook. Conduct an appropriate hypothesis test using a 0.05 level of 
significance. 

Exercise 11.9.20 (Solution on p. 496.) 

A fisherman is interested in whether the distribution of fish caught in Green Valley Lake is the 
same as the distribution of fish caught in Echo Lake. Of the 191 randomly selected fish caught in 
Green Valley Lake, 105 were rainbow trout, 27 were other trout, 35 were bass, and 24 were catfish. 
Of the 293 randomly selected fish caught in Echo Lake, 115 were rainbow trout, 58 were other 
trout, 67 were bass, and 53 were catfish. Perform the hypothesis test at a 5% level of significance. 

Exercise 11.9.21 (Solution on p. 497.) 

A plant manager is concerned her equipment may need recalibrating. It seems that the actual 
weight of the 15 oz. cereal boxes it fills has been fluctuating. The standard deviation should be 
at most i oz. In order to determine if the machine needs to be recalibrated, 84 randomly selected 
boxes of cereal from the next day's production were weighed. The standard deviation of the 84 
boxes was 0.54. Does the machine need to be recalibrated? 

Exercise 11.9.22 

Consumers may be interested in whether the cost of a particular calculator varies from store to 
store. Based on surveying 43 stores, which yielded a sample mean of $84 and a sample standard 
deviation of $12, test the claim that the standard deviation is greater than $15. 

Exercise 11.9.23 (Solution on p. 497.) 

Isabella, an accomplished Bay to Breakers runner, claims that the standard deviation for her time 
to run the 7 Vi mile race is at most 3 minutes. To test her claim, Rupinder looks up 5 of her race 
times. They are 55 minutes, 61 minutes, 58 minutes, 63 minutes, and 57 minutes. 

Exercise 11.9.24 

Airline companies are interested in the consistency of the number of babies on each flight, so that 
they have adequate safety equipment. They are also interested in the variation of the number of 
babies. Suppose that an airline executive believes the average number of babies on flights is 6 with 
a variance of 9 at most. The airline conducts a survey. The results of the 18 flights surveyed give 
a sample average of 6.4 with a sample standard deviation of 3.9. Conduct a hypothesis test of the 
airline executive's belief. 
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Exercise 11.9.25 (Solution on p. 497.) 

The number of births per woman in China is 1.6 down from 5.91 in 1966 (Source World Bank, 
6/5/12). This fertility rate has been attributed to the law passed in 1979 restricting births to one 
per woman. Suppose that a group of students studied whether or not the standard deviation of 
births per woman was greater than 0.75. They asked 50 women across China the number of births 
they had. Below are the results. Does the students' survey indicate that the standard deviation is 
greater than 0.75? 



# of births 


Frequency 





5 


1 


30 


2 


10 


3 


5 



Table 11.32 

Exercise 11.9.26 

According to an avid aquariest, the average number of fish in a 20-gallon tank is 10, with a 
standard deviation of 2. His friend, also an aquariest, does not believe that the standard deviation 
is 2. She counts the number of fish in 15 other 20-gallon tanks. Based on the results that follow, do 
you think that the standard deviation is different from 2? Data: 11; 10; 9; 10; 10; 11; 11; 10; 12; 9; 7; 
9; 11; 10; 11 

Exercise 11.9.27 (Solution on p. 497.) 

The manager of "Frenchies" is concerned that patrons are not consistently receiving the same 
amount of French fries with each order. The chef claims that the standard deviation for a 1 fl- 
ounce order of fries is at most 1.5 oz., but the manager thinks that it may be higher. He randomly 
weighs 49 orders of fries, which yields a mean of 11 oz. and a standard deviation of 2 oz. 



11.9.2 Try these true/false questions. 

Exercise 11.9.28 (Solution on p. 497.) 

As the degrees of freedom increase, the graph of the chi-square distribution looks more and more 
symmetrical. 

Exercise 11.9.29 (Solution on p. 497.) 

The standard deviation of the chi-square distribution is twice the mean. 

Exercise 11.9.30 (Solution on p. 497.) 

The mean and the median of the chi-square distribution are the same if df = 24. 

Exercise 11.9.31 (Solution on p. 497.) 

In a Goodness-of-Fit test, the expected values are the values we would expect if the null hypoth- 
esis were true. 

Exercise 11.9.32 (Solution on p. 497.) 

In general, if the observed values and expected values of a Goodness-of-Fit test are not close 
together, then the test statistic can get very large and on a graph will be way out in the right tail. 

Exercise 11.9.33 (Solution on p. 497.) 

The degrees of freedom for a Test for Independence are equal to the sample size minus 1 . 
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Exercise 11.9.34 (Solution on p. 498.) 

Use a Goodness-of-Fit test to determine if high school principals believe that students are absent 
equally during the week or not. 

Exercise 11.9.35 (Solution on p. 498.) 

The Test for Independence uses tables of observed and expected data values. 

Exercise 11.9.36 (Solution on p. 498.) 

The test to use when determining if the college or university a student chooses to attend is related 
to his/her socioeconomic status is a Test for Independence. 

Exercise 11.9.37 (Solution on p. 498.) 

The test to use to determine if a six-sided die is fair is a Goodness-of-Fit test. 

Exercise 11.9.38 (Solution on p. 498.) 

In a Test of Independence, the expected number is equal to the row total multiplied by the column 
total divided by the total surveyed. 

Exercise 11.9.39 (Solution on p. 498.) 

In a Goodness-of Fit test, if the p-value is 0.0113, in general, do not reject the null hypothesis. 

Exercise 11.9.40 (Solution on p. 498.) 

For a Chi-Square distribution with degrees of freedom of 17, the probability that a value is greater 
than 20 is 0.7258. 

Exercise 11.9.41 (Solution on p. 498.) 

If df = 2, the chi-square distribution has a shape that reminds us of the exponential. 
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11.10 Review 10 

The next two questions refer to the following real study: 

A recent survey of U.S. teenage pregnancy was answered by 720 girls, age 12 - 19. 6% of the girls surveyed 
said they have been pregnant. (Parade Magazine) We are interested in the true proportion of U.S. girls, age 
12 - 19, who have been pregnant. 

Exercise 11.10.1 (Solution on p. 498.) 

Find the 95% confidence interval for the true proportion of U.S. girls, age 12 - 19, who have been 
pregnant. 

Exercise 11.10.2 (Solution on p. 498.) 

The report also stated that the results of the survey are accurate to within ± 3.7% at the 95% 
confidence level. Suppose that a new study is to be done. It is desired to be accurate to within 2% 
of the 95% confidence level. What is the minimum number that should be surveyed? 

Exercise 11.10.3 

Given: X ~ Exp I i j . Sketch the graph that depicts: P (x > 1). 

The next four questions refer to the following information: 

Suppose that the time that owners keep their cars (purchased new) is normally distributed with a mean 
of 7 years and a standard deviation of 2 years. We are interested in how long an individual keeps his car 
(purchased new). Our population is people who buy their cars new. 

Exercise 11.10.4 (Solution on p. 498.) 

60% of individuals keep their cars at most how many years? 

Exercise 11.10.5 (Solution on p. 498.) 

Suppose that we randomly survey one person. Find the probability that person keeps his/her car 
less than 2.5 years. 

Exercise 11.10.6 (Solution on p. 498.) 

If we are to pick individuals 10 at a time, find the distribution for the average car length owner- 
ship. 

Exercise 11.10.7 (Solution on p. 498.) 

If we are to pick 10 individuals, find the probability that the sum of their ownership time is more 
than 55 years. 

Exercise 11.10.8 (Solution on p. 498.) 

For which distribution is the median not equal to the mean? 

A. Uniform 

B. Exponential 

C. Normal 

D. Student-t 

Exercise 11.10.9 (Solution on p. 498.) 

Compare the standard normal distribution to the student-t distribution, centered at 0. Explain 
which of the following are true and which are false. 

a. As the number surveyed increases, the area to the left of -1 for the student-t distribution ap- 

proaches the area for the standard normal distribution. 

b. As the degrees of freedom decrease, the graph of the student-t distribution looks more like the 

graph of the standard normal distribution. 



"This content is available online at <http://cnx.Org/content/ml7057/l.10/>. 
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c. If the number surveyed is 15, the normal distribution should never be used. 

The next five questions refer to the following information: 

We are interested in the checking account balance of a twenty-year-old college student. We randomly 
survey 16 twenty-year-old college students. We obtain a sample mean of $640 and a sample standard 
deviation of $150. Let X = checking account balance of an individual twenty year old college student. 

Exercise 11.10.10 

Explain why we cannot determine the distribution of X. 

Exercise 11.10.11 (Solution on p. 498.) 

If you were to create a confidence interval or perform a hypothesis test for the population average 
checking account balance of 20-year old college students, what distribution would you use? 

Exercise 11.10.12 (Solution on p. 498.) 

Find the 95% confidence interval for the true average checking account balance of a twenty-year- 
old college student. 

Exercise 11.10.13 (Solution on p. 498.) 

What type of data is the balance of the checking account considered to be? 

Exercise 11.10.14 (Solution on p. 498.) 

What type of data is the number of 20 year olds considered to be? 

Exercise 11.10.15 (Solution on p. 498.) 

On average, a busy emergency room gets a patient with a shotgun wound about once per week. 
We are interested in the number of patients with a shotgun wound the emergency room gets per 
28 days. 

a. Define the random variable X. 

b. State the distribution for X. 

c. Find the probability that the emergency room gets no patients with shotgun wounds in the next 

28 days. 

The next two questions refer to the following information: 

The probability that a certain slot machine will pay back money when a quarter is inserted is 0.30 . Assume 
that each play of the slot machine is independent from each other. A person puts in 15 quarters for 15 plays. 

Exercise 11.10.16 (Solution on p. 499.) 

Is the expected number of plays of the slot machine that will pay back money greater than, less 
than or the same as the median? Explain your answer. 

Exercise 11.10.17 (Solution on p. 499.) 

Is it likely that exactly 8 of the 15 plays would pay back money? Justify your answer numerically. 

Exercise 11.10.18 (Solution on p. 499.) 

A game is played with the following rules: 

• it costs $10 to enter 

• a fair coin is tossed 4 times 

• if you do not get 4 heads or 4 tails, you lose your $10 

• if you get 4 heads or 4 tails, you get back your $10, plus $30 more 

Over the long run of playing this game, what are your expected earnings? 

Exercise 11.10.19 (Solution on p. 499.) 
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• The average grade on a math exam in Rachel's class was 74, with a standard deviation of 5. 
Rachel earned an 80. 

• The average grade on a math exam in Becca's class was 47, with a standard deviation of 2. 
Becca earned a 51. 

• The average grade on a math exam in Matt's class was 70, with a standard deviation of 8. 
Matt earned an 83. 

Find whose score was the best, compared to his or her own class. Justify your answer numerically. 
The next two questions refer to the following information: 

A random sample of 70 compulsive gamblers were asked the number of days they go to casinos per week. 
The results are given in the following graph: 

Relative Frequency 



0.3 



0.2 - 



0.1 



6 7 

Number of Days 



Figure 11.3 



Exercise 11.10.20 (Solution on p. 499.) 

Find the number of responses that were "5". 

Exercise 11.10.21 (Solution on p. 499.) 

Find the mean, standard deviation, the median, the first quartile, the third quartile and the IQR. 

Exercise 11.10.22 (Solution on p. 499.) 

Based upon research at De Anza College, it is believed that about 19% of the student population 
speaks a language other than English at home. 

Suppose that a study was done this year to see if that percent has decreased. Ninety -eight students 
were randomly surveyed with the following results. Fourteen said that they speak a language 
other than English at home. 

a. State an appropriate null hypothesis. 

b. State an appropriate alternate hypothesis. 

c. Define the Random Variable, P'. 
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d. Calculate the test statistic. 

e. Calculate the p-value. 

f. At the 5% level of decision, what is your decision about the null hypothesis? 

g. What is the Type I error? 
h. What is the Type II error? 

Exercise 11.10.23 (Solution on p. 499.) 

Assume that you are an emergency paramedic called in to rescue victims of an accident. You 
need to help a patient who is bleeding profusely. The patient is also considered to be a high risk 
for contracting AIDS. Assume that the null hypothesis is that the patient does not have the HIV 
virus. What is a Type I error? 

Exercise 11.10.24 (Solution on p. 499.) 

It is often said that Californians are more casual than the rest of Americans. Suppose that a 
survey was done to see if the proportion of Californian professionals that wear jeans to work is 
greater than the proportion of non-Californian professionals. Fifty of each was surveyed with the 
following results. 15 Californians wear jeans to work and 6 non-Californians wear jeans to work. 

• C = Californian professional 

• NC = non-Californian professional 

a. State appropriate null and alternate hypotheses. 

b. Define the Random Variable. 

c. Calculate the test statistic and p-value. 

d. At the 5% significance level, what is your decision? 

e. What is the Type I error? 

f . What is the Type II error? 

The next two questions refer to the following information: 

A group of Statistics students have developed a technique that they feel will lower their anxiety level on 
statistics exams. They measured their anxiety level at the start of the quarter and again at the end of the 
quarter. Recorded is the paired data in that order: (1000, 900); (1200, 1050); (600, 700); (1300, 1100); (1000, 
900); (900, 900). 

Exercise 11.10.25 (Solution on p. 499.) 

This is a test of (pick the best answer): 

A. large samples, independent means 

B. small samples, independent means 

C. dependent means 

Exercise 11.10.26 (Solution on p. 499.) 

State the distribution to use for the test. 
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11.11 Lab 1: Chi-Square Goodness-of-Fit 11 

Class Time: 
Names: 

11.11.1 Student Learning Outcome: 

• The student will evaluate data collected to determine if they fit either the uniform or exponential 
distributions. 



11.11.2 Collect the Data 

NOTE: You may need to combine two categories so that each cell has an expected value of at least 

5. 

Go to your local supermarket. Ask 30 people as they leave for the total amount on their grocery receipts. 
(Or, ask 3 cashiers for the last 10 amounts. Be sure to include the express lane, if it is open.) 

1. Record the values. 



Table 11.33 

2. Construct a histogram of the data. Make 5-6 intervals. Sketch the graph using a ruler and pencil. 
Scale the axes. 



n This content is available online at <http://cnx.Org/content/ml7049/l.9/> 
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Relative Frequency 



Amount of Receipt 



Figure 11.4 



3. Calculate the following: 



a. x 

b. s 

c. s 2 



11.11.3 Uniform Distribution 

Test to see if grocery receipts follow the uniform distribution. 



U(. 



1 . Using your lowest and highest values, X 

2. Divide the distribution above into fifths. 

3. Calculate the following: 

a. Lowest value = 

b. 20th percentile = 

c. 40th percentile = 

d. 60th percentile = 

e. 80th percentile = 

f . Highest value = 

4. For each fifth, count the observed number of receipts and record it. Then determine the expected 
number of receipts and record that. 
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Fifth 


Observed 


Expected 


1st 






2nd 






3rd 






4th 






5th 







Table 11.34 



5. 
6. 

7. 



H : 
H a : 



What distribution should you use for a hypothesis test? 

8. Why did you choose this distribution? 

9. Calculate the test statistic. 

10. Find the p-value. 

11. Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p- 
value. 



Figure 11.5 
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12. State your decision. 

13. State your conclusion in a complete sentence. 



11.11.4 Exponential Distribution 

Test to see if grocery receipts follow the exponential distribution with decay parameter 1 . 

1. Using A as the decay parameter, X ~ Exp ( ). 

2. Calculate the following: 

a. Lowest value = 

b. First quartile = 

c. 37th percentile = 

d. Median = 

e. 63rd percentile = 

f. 3rd quartile = 

g. Highest value = 

3. For each cell, count the observed number of receipts and record it. Then determine the expected 
number of receipts and record that. 



Cell 


Observed 


Expected 


1st 






2nd 






3rd 






4th 






5th 






6th 







Table 11.35 

4. H 

5. H a 

6. What distribution should you use for a hypothesis test? 

7. Why did you choose this distribution? 

8. Calculate the test statistic. 

9. Find the p-value. 

10. Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p- 
value. 
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Figure 11.6 



11. State your decision. 

12. State your conclusion in a complete sentence. 



11.11.5 Discussion Questions 

1 . Did your data fit either distribution? If so, which? 

2. In general, do you think it's likely that data could fit more than one distribution? In complete sen- 
tences, explain why or why not. 
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11.12 Lab 2: Chi-Square Test for Independence 

Class Time: 
Names: 



12 



11.12.1 Student Learning Outcome: 

• The student will evaluate if there is a significant relationship between favorite type of snack and 
gender. 



11.12.2 Collect the Data 

1 . Using your class as a sample, complete the following chart. 

NOTE: You may need to combine two food categories so that each cell has an expected value 
of at least 5 

Favorite type of snack 





sweets (candy & baked goods) 


ice cream 


chips & pretzels 


fruits & vegetables 


Total 


male 












female 












Total 













Table 11.36 



2. Looking at the above chart, does it appear to you that there is dependence between gender and fa- 
vorite type of snack food? Why or why not? 



11.12.3 Hypothesis Test 

Conduct a hypothesis test to determine if the factors are independent 

1. H : 

2. H a : 

3. What distribution should you use for a hypothesis test? 

4. Why did you choose this distribution? 

5. Calculate the test statistic. 

6. Find the p-value. 

7. Sketch a graph of the situation. Label and scale the x-axis. Shade the area corresponding to the p- 
value. 



12 This content is available online at <http://cnx.Org/content/ml7050/l.ll/>. 
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Figure 11.7 



8. State your decision. 

9. State your conclusion in a complete sentence. 



11.12.4 Discussion Questions 

1. Is the conclusion of your study the same as or different from your answer to (12) above? 

2. Why do you think that occurred? 
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Solutions to Exercises in Chapter 11 
Solutions to Practice 1: Goodness-of-Fit Test 

Solution to Exercise 11.7.4 (p. 472) 

degrees of freedom = 3 
Solution to Exercise 11.7.5 (p. 472) 

2016.14 
Solution to Exercise 11.7.6 (p. 472) 

Rounded to 4 decimal places, the p-value is 0.0000. 

Solutions to Practice 2: Contingency Tables 

Solution to Exercise 11.8.1 (p. 473) 

12 
Solution to Exercise 11.8.2 (p. 473) 

10301.8 
Solution to Exercise 11.8.3 (p. 473) 


Solution to Exercise 11.8.4 (p. 473) 

right 
Solution to Exercise 11.8.6 (p. 474) 

a. Reject the null hypothesis 

Solutions to Homework 
Solution to Exercise 11.9.3 (p. 475) 

a. The data fits the distribution 

b. The data does not fit the distribution 

c. 3 

e. 19.27 

f. 0.0002 

h. Decision: Reject Null; Conclusion: Data does not fit the distribution. 

Solution to Exercise 11.9.5 (p. 476) 

c. 5 

e. 13.4 

f. 0.0199 

g. Decision: Reject null when a = 0.05; Conclusion: Local data do not fit the AP Examinee Distribution. 

Decision: Do not reject null when a = 0.01; Conclusion: There is insufficient evidence to conclude 
that Local data do not fit the AP Examinee Distribution. 

Solution to Exercise 11.9.7 (p. 477) 

c. 10 

e. 11.48 

f. 0.3214 

h. Decision: Do not reject null when a = 0.05 and a = 0.01; Conclusion: There is insufficient evidence to 
conclude that the distribution of majors by graduating females does not fit the distribution of expected 
majors. 

Solution to Exercise 11.9.9 (p. 478) 
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c. 4 

e. 10.53 

f. 0.0324 

h. Decision: Reject null; Conclusion: Best ski area and level of skier are not independent. 

Solution to Exercise 11.9.11 (p. 478) 

c. 8 

e. 33.55 

f. 

h. Decision: Reject null; Conclusion: Major and starting salary are not independent events. 

Solution to Exercise 11.9.13 (p. 479) 

c. 6 

e. 25.21 

f. 0.0003 

h. Decision: Reject null 

Solution to Exercise 11.9.15 (p. 480) 

c. 12 

e. 125.74 

f. 

h. Decision: Reject null 

Solution to Exercise 11.9.17 (p. 480) 

c: 4 

d: Chi-Square with df = 4 

e: 3.01 

f: p-value = 0.5568 

h: ii. Do not reject the null hypothesis. 

iv. There is insufficient evidence to conclude that the distribution of personality types is different for 

business and social science majors. 

Solution to Exercise 11.9.18 (p. 481) 

c: 3 

e: 4.01 

f: p-value = 0.2601 

h: ii. Do not reject the null hypothesis. 

iv. There is insufficient evidence to conclude that the distribution of breakfast ordered is different for 

men and women. 

Solution to Exercise 11.9.19 (p. 481) 

c: 2 

e: 7.05 

f: p-value = 0.0294 

h: ii. Reject the null hypothesis. 

iv. There is sufficient evidence to conclude that the distribution of technology use for statistics home- 
work is not the same for statistics students at community colleges and at universities. 

Solution to Exercise 11.9.20 (p. 481) 

c: 3 
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d: Chi-Square with df = 3 

e: 11.75 

f: p-value = 0.0083 

h: ii. Reject the null hypothesis. 

iv. There is sufficient evidence to conclude that the distribution of fish in Green Valley Lake is not the 

same as the distribution of fish in Echo Lake. 

Solution to Exercise 11.9.21 (p. 481) 

c. 83 

d. Chi-Square with df = 83 

e. 96.81 

f. p-value = 0.1426; There is a 0.1426 probability that the sample standard deviation is 0.54 or more. 

h. Decision: Do not reject null; Conclusion: There is insufficient evidence to conclude that the standard 
deviation is more than 0.5 oz. It cannot be determined whether the equipment needs to be recalibrated 
or not. 

Solution to Exercise 11.9.23 (p. 481) 

c. 4 

d. Chi-Square with df = 4 

e. 4.52 

f. 0.3402 

h. Decision: Do not reject null. 

Solution to Exercise 11.9.25 (p. 482) 

c. 49 

d. Chi-Square with df = 49 

e. 54.37 

f. p-value = 0.2774; If the null hypothesis is true, there is a 0.2774 probability that the sample standard 

deviation is 0.79 or more. 
h. Decision: Do not reject null; Conclusion: There is insufficient evidence to conclude that the standard 
deviation is more than 0.75. It cannot be determined if the standard deviation is greater than 0.75 or 
not. 

Solution to Exercise 11.9.27 (p. 482) 

a. a 2 < (1.5) 2 

c. 48 

d. Chi-Square with df = 48 

e. 85.33 

f. 0.0007 

h. Decision: Reject null. 

Solution to Exercise 11.9.28 (p. 482) 
True 

Solution to Exercise 11.9.29 (p. 482) 
False 

Solution to Exercise 11.9.30 (p. 482) 
False 

Solution to Exercise 11.9.31 (p. 482) 
True 

Solution to Exercise 11.9.32 (p. 482) 
True 
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Solution to Exercise 11.9.33 (p. 482) 

False 

Solution to Exercise 11.9.34 (p. 483) 

True 

Solution to Exercise 11.9.35 (p. 483) 

True 

Solution to Exercise 11.9.36 (p. 483) 

True 

Solution to Exercise 11.9.37 (p. 483) 

True 

Solution to Exercise 11.9.38 (p. 483) 

True 

Solution to Exercise 11.9.39 (p. 483) 

False 

Solution to Exercise 11.9.40 (p. 483) 

False 

Solution to Exercise 11.9.41 (p. 483) 

True 



Solutions to Review 

Solution to Exercise 11.10.1 (p. 484) 
(0.0424,0.0770) 

Solution to Exercise 11.10.2 (p. 484) 
2401 

Solution to Exercise 11.10.4 (p. 484) 
7.5 

Solution to Exercise 11.10.5 (p. 484) 
0.0122 

Solution to Exercise 11.10.6 (p. 484) 
N (7, 0.63) 

Solution to Exercise 11.10.7 (p. 484) 
0.9911 

Solution to Exercise 11.10.8 (p. 484) 
B 
Solution to Exercise 11.10.9 (p. 484) 

a. True 

b. False 

c. False 

Solution to Exercise 11.10.11 (p. 485) 

student-t with df = 15 

Solution to Exercise 11.10.12 (p. 485) 

(560.07,719.93) 
Solution to Exercise 11.10.13 (p. 485) 

quantitative - continuous 
Solution to Exercise 11.10.14 (p. 485) 

quantitative - discrete 

Solution to Exercise 11.10.15 (p. 485) 

b. P (4) 

c. 0.0183 
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Solution to Exercise 11.10.16 (p. 485) 
greater than 

Solution to Exercise 11.10.17 (p. 485) 
No; P (x = 8) = 0.0348 
Solution to Exercise 11.10.18 (p. 485) 
You will lose $5 

Solution to Exercise 11.10.19 (p. 485) 
Becca 

Solution to Exercise 11.10.20 (p. 486) 
14 
Solution to Exercise 11.10.21 (p. 486) 

Sample mean = 3.2 

Sample standard deviation = 1 .85 

Median = 3 

Quartile 1 = 2 

Quartile 3 = 5 

IQR = 3 

Solution to Exercise 11.10.22 (p. 486) 

d. z= -1.19 

e. 0.1171 

f . Do not reject the null 

Solution to Exercise 11.10.23 (p. 487) 

We conclude that the patient does have the HIV virus when, in fact, the patient does not. 

Solution to Exercise 11.10.24 (p. 487) 

c. z = 2.21 ; p = 0.0136 

d. Reject the null 

e. We conclude that the proportion of Californian professionals that wear jeans to work is greater than the 

proportion of non-Californian professionals when, in fact, it is not greater. 

f. We cannot conclude that the proportion of Californian professionals that wear jeans to work is greater 

than the proportion of non-Californian professionals when, in fact, it is greater. 

Solution to Exercise 11.10.25 (p. 487) 
C 
Solution to Exercise 11.10.26 (p. 487) 

h 
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Chapter 12 

F Distribution and ANOVA 

12.1 F Distribution and ANOVA 1 

12.1.1 Student Learning Objectives 

By the end of this chapter, the student should be able to: 

• Interpret the F probability distribution as the number of groups and the sample size change. 

• Discuss two uses for the F distribution, ANOVA and the test of two variances. 

• Conduct and interpret ANOVA. 

• Conduct and interpret hypothesis tests of two variances (optional). 

12.1.2 Introduction 

Many statistical applications in psychology, social science, business administration, and the natural sciences 
involve several groups. For example, an environmentalist is interested in knowing if the average amount of 
pollution varies in several bodies of water. A sociologist is interested in knowing if the amount of income a 
person earns varies according to his or her upbringing. A consumer looking for a new car might compare 
the average gas mileage of several models. 

For hypothesis tests involving more than two averages, statisticians have developed a method called Anal- 
ysis of Variance" (abbreviated ANOVA). In this chapter, you will study the simplest form of ANOVA called 
single factor or one-way ANOVA. You will also study the F distribution, used for ANOVA, and the test of 
two variances. This is just a very brief overview of ANOVA. You will study this topic in much greater detail 
in future statistics courses. 

• ANOVA, as it is presented here, relies heavily on a calculator or computer. 

• For further information about ANOVA, use the online link ANOVA 2 . Use the back button to return 
here. (The url is http://en.wikipedia.org/wiki/Analysis_of_variance.) 



1 This content is available online at <http://cnx.org/content/ml7065/1.7/>. 
2 http : / / en. wikipedia. org / wiki / Analy sis_of_variance 
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12.2 ANOVA 3 

12.2.1 F Distribution and ANOVA: Purpose and Basic Assumption of ANOVA 

The purpose of an ANOVA test is to determine the existence of a statistically significant difference among 
several group means. The test actually uses variances to help determine if the means are equal or not. 

In order to perform an ANOVA test, there are three basic assumptions to be fulfilled: 

• Each population from which a sample is taken is assumed to be normal. 

• Each sample is randomly selected and independent. 

• The populations are assumed to have equal standard deviations (or variances). 

12.2.2 The Null and Alternate Hypotheses 

The null hypothesis is simply that all the group population means are the same. The alternate hypothesis 
is that at least one pair of means is different. For example, if there are k groups: 

Ho : ]i\ = Y-i = n = ... = n 

H a : At least two of the group means \i\, ]ii, ^3, ..., fi^ are not equal. 

12.3 The F Distribution and the F Ratio 4 

The distribution used for the hypothesis test is a new one. It is called the F distribution, named after Sir 
Ronald Fisher, an English statistician. The F statistic is a ratio (a fraction). There are two sets of degrees of 
freedom; one for the numerator and one for the denominator. 

For example, if F follows an F distribution and the degrees of freedom for the numerator are 4 and the 
degrees of freedom for the denominator are 10, then F ~ F4 io- 

To calculate the F ratio, two estimates of the variance are made. 

1. Variance between samples: An estimate of a 2 that is the variance of the sample means. If the samples 
are different sizes, the variance between samples is weighted to account for the different sample sizes. 
The variance is also called variation due to treatment or explained variation. 

2. Variance within samples: An estimate of c 2 that is the average of the sample variances (also known 
as a pooled variance). When the sample sizes are different, the variance within samples is weighted. 
The variance is also called the variation due to error or unexplained variation. 

• SSbetween — the sum of squares that represents the variation among the different samples. 

• SS within — the sum of squares that represents the variation within samples that is due to chance. 

To find a "sum of squares" means to add together squared quantities which, in some cases, may be weighted. 
We used sum of squares to calculate the sample variance and the sample standard deviation in Descriptive 
Statistics. 

MS means "mean square." MSb etW een is me variance between groups and MS witn i n is the variance within 
groups. 

Calculation of Sum of Squares and Mean Square 

3 This content is available online at <http://cnx.Org/content/ml7068/l.6/>. 
4 This content is available online at <http://cnx.Org/content/ml7076/l.9/>. 
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k = the number of different groups 

n j = the size of the jth group 

s,= the sum of the values in the jth group 

N = total number of all the values combined, (total sample size: £jn,-) 

x = one value: £i = ^S; 

Sum of squares of all values from every group combined: 7J x 2 

Between group variability: SS tota i = 7J x 2 — ' *y 



,2 



Total sum of squares: J^x 2 — - ^ 

Explained variation- sum of squares representing variation among the different samples SSb e tween 



(sj) 2 



M 



N 

Unexplained variation- sum of squares representing variation within samples due to chance: 

^within = ^total — ^between 

df's for different groups (df's for the numerator): df between — k — 1 

Equation for errors within samples (df's for the denominator): df W j t hi n = N — k 

cc 

Mean square (variance estimate) explained by the different groups: MSbetween = gf 



between 
within 



cc 

• Mean square (variance estimate) that is due to chance (unexplained): MS wit h in = -g-' 
MSbetween an d MS W j t hin can be written as follows: 

A Ayf C — '-''-'between — ^ ^between 

• mD between — JJ— L — — iCT — 

J between 
» AAQ — ^within — g ^within 

The ANOVA test depends on the fact that MSbetween can be influenced by population differences among 
means of the several groups. Since MS w i t hi n compares values of each group to its own group mean, the fact 
that group means might be different does not affect MS w i t hin- 

The null hypothesis says that all groups are samples from populations having the same normal distribution. 
The alternate hypothesis says that at least two of the sample groups come from populations with different 
normal distributions. If the null hypothesis is true, MSbetween an d MS w i tn in should both estimate the same 
value. 

NOTE: The null hypothesis says that all the group population means are equal. The hypothesis of 
equal means implies that the populations have the same normal distribution because it is assumed 
that the populations are normal and that they have equal variances. 

F-Ratio or F Statistic 

p ■'VI ^between C1 ? 1 "> 

MSwithin 

If MSbetween an d Mf5 w j t hi n estimate the same value (following the belief that H is true), then the F-ratio 
should be approximately equal to 1. Only sampling errors would contribute to variations away from 1. As 
it turns out, MSbetween consists of the population variance plus a variance produced from the differences 
between the samples. MS w i t hin is an estimate of the population variance. Since variances are always pos- 
itive, if the null hypothesis is false, MSbetween wm be larger than MS w i t hm- The F-ratio will be larger than 
1. 

The above calculations were done with groups of different sizes. If the groups are the same size, the calcu- 
lations simplify somewhat and the F ratio can be written as: 
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F-Ratio Formula when the groups are the same size 

F = n '^ x) . (12.2) 



1 s pooled I 



where ... 

■a 



• (%) —the variance of the sample means 

• n =the sample size of each group 

• ( Spooled ) =the mean of the sample variances (pooled variance) 

• di numera t or — K 1 

• df denominator = k(n-l) = N-k 

The ANOVA hypothesis test is always right-tailed because larger F-values are way out in the right tail of 
the F-distribution curve and tend to make us reject H . 

12.3.1 Notation 

The notation for the F distribution is F ~ f df(num),df(denom) 
where df(num) = d/ between and df(denom) = df within 

The mean for the F distribution is ]i = d f^mom)-l 

12.4 Facts About the F Distribution 5 

1. The curve is not symmetrical but skewed to the right. 

2. There is a different curve for each set of dfs. 

3. The F statistic is greater than or equal to zero. 

4. As the degrees of freedom for the numerator and for the denominator get larger, the curve approxi- 
mates the normal. 

5. Other uses for the F distribution include comparing two variances and Two-Way Analysis of Variance. 
Comparing two variances is discussed at the end of the chapter. Two-Way Analysis is mentioned for 
your information only. 



5 This content is available online at <http://cnx.Org/content/ml7062/l.ll/>. 
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10,25 



(a) 




Figure 12.1 



Example 12.1 

One- Way ANO VA: Four sororities took a random sample of sisters regarding their grade averages 

for the past term. The results are shown below: 



GRADE AVERAGES FOR FOUR SORORITIES 


Sorority 1 


Sorority 2 


Sorority 3 


Sorority 4 


2.17 


2.63 


2.63 


3.79 


1.85 


1.77 


3.78 


3.45 


2.83 


3.25 


4.00 


3.08 


1.69 


1.86 


2.55 


2.26 


3.33 


2.21 


2.45 


3.18 



Table 12.1 

Problem 

Using a significance level of 1%, is there a difference in grade averages among the sororities? 

Solution 

Let U\, }i2, U$, U$ be the population means of the sororities. Remember that the null hypothesis 
claims that the sorority groups are from the same normal distribution. The alternate hypothesis 
says that at least two of the sorority groups come from populations with different normal distri- 
butions. Notice that the four sample sizes are each size 5. 

H \}l l = }l2 = f< 3 = H 

H a : Not all of the means U\, Ui, Uj, u± are equal. 

Distribution for the test: F 3 16 

where k — 4 groups and N = 20 samples in total 

df (num.) =fc-l = 4-l = 3 
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df (denom) =N -k = 20 -4 = 16 
Calculate the test statistic: F = 2.23 
Graph: 




p-value = 0.1241 



Figure 12.2 

Probability statement: p-value = P (F > 2.23) = 0.1241 

Compare a and the p — value: a. = 0.01 p-value = 0.1242 a < p-value 

Make a decision: Since a. < p-value, you cannot reject H . 

This means that the population averages appear to be the same. 

Conclusion: There is not sufficient evidence to conclude that there is a difference among the grade 
averages for the sororities. 

TI-83+ or TI 84: Put the data into lists LI, L2, L3, and L4. Press STAT and arrow over to TESTS. 
Arrow down to F: ANOVA. Press ENTER and Enter (L1.L2.L3.L4). The F statistic is 2.2303 and the 
p-value is 0.1241. df(numerator) = 3 (under "Factor") and df(denominator) = 16 (under Error). 



Example 12.2 

A fourth grade class is studying the environment. One of the assignments is to grow bean plants 
in different soils. Tommy chose to grow his bean plants in soil found outside his classroom mixed 
with dryer lint. Tara chose to grow her bean plants in potting soil bought at the local nursery. 
Nick chose to grow his bean plants in soil from his mother's garden. No chemicals were used 
on the plants, only water. They were grown inside the classroom next to a large window. Each 
child grew 5 plants. At the end of the growing period, each plant was measured, producing the 
following data (in inches): 
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Tommy's Plants 


Tara's Plants 


Nick's Plants 


24 


25 


23 


21 


31 


27 


23 


23 


22 


30 


20 


30 


23 


28 


20 



Table 12.2 

Problem 1 

Does it appear that the three media in which the bean plants were grown produce the same 
average height? Test at a 3% level of significance. 

Solution 

This time, we will perform the calculations that lead to the F' statistic. Notice that each group has 



the same number of plants so we will use the formula F' = 



"•(«*) 



y Spooled ) 

First, calculate the sample mean and sample variance of each group. 





Tommy's Plants 


Tara's Plants 


Nick's Plants 


Sample Mean 


24.2 


25.4 


24.4 


Sample Variance 


11.7 


18.3 


16.3 



Table 12.3 

Next, calculate the variance of the three group means (Calculate the variance of 24.2, 25.4, and 
24.4). Variance of the group means = 0.413 = (s*) 

Then MSt, etween = n ( s x) = (5) (0.413) where n = 5 is the sample size (number of plants each 
child grew). 

Calculate the average of the three sample variances (Calculate the average of 11.7, 18.3, and 16.3). 



Average of the sample variances = 15.433 = f s poo i ec j 



Then MS within = [s pooled j = 15.433. 

tyl ^between 



The F statistic (or F ratio) is F 



n-(s- x Y 



MS„ 



(5) -(0-413) 
15.433 



0.134 



\ s pooled ) 

The dfs for the numerator = the number of groups — 1 = 3 — 1 = 2 

The dfs for the denominator = the total number of samples — the number of groups = 15 — 3 = 12 



The distribution for the test is F212 an d the F statistic is F = 0.134 
The p-value is P (F > 0.134) = 0.8759. 
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Decision: Since a = 0.03 and the p-value = 0.8759, do not reject H . (Why?) 

Conclusion: With a 3% the level of significance, from the sample data, the evidence is not sufficient 
to conclude that the average heights of the bean plants are not different. Of the three media tested, 
it appears that it does not matter which one the bean plants are grown in. 

(This experiment was actually done by three classmates of the son of one of the authors.) 

Another fourth grader also grew bean plants but this time in a jelly-like mass. The heights were 
(in inches) 24, 28, 25, 30, and 32. 

Problem 2 (Solution on p. 520.) 

Do an ANOVA test on the 4 groups. You may use your calculator or computer to perform the 
test. Are the heights of the bean plants different? Use a solution sheet (Section 13.5.4). 



12.4.1 Optional Classroom Activity 

Randomly divide the class into four groups of the same size. Have each member of each group record the 
number of states in the United States he or she has visited. Run an ANOVA test to determine if the average 
number of states visited in the four groups are the same. Test at a 1% level of significance. Use one of the 
solution sheets (Section 13.5.4) at the end of the chapter (after the homework). 
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12.5 Summary 6 

• An ANOVA hypothesis test determines if several population means are equal. The distribution for 
the test is the F distribution with 2 different degrees of freedom. 

Assumptions: 

a. Each population from which a sample is taken is assumed to be normal. 

b. Each sample is randomly selected and independent. 

c. The populations are assumed to have equal standard deviations (or variances) 

• A Test of Two Variances hypothesis test determines if two variances are the same. The distribution 
for the hypothesis test is the F distribution with 2 different degrees of freedom. 

Assumptions: 

a. The populations from which the two samples are drawn are normally distributed. 

b. The two populations are independent of each other. 



6 This content is available online at <http://cnx.org/content/ml7072/13/>. 
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12.6 Practice: ANOVA 7 

12.6.1 Student Learning Outcome 

• The student will explore the properties of ANOVA. 

12.6.2 Given 

Suppose a group is interested in determining whether teenagers obtain their drivers licenses at approxi- 
mately the same average age across the country. Suppose that the following data are randomly collected 
from five teenagers in each region of the country. The numbers represent the age at which teenagers ob- 
tained their drivers licenses. 





Northeast 


South 


West 


Central 


East 




16.3 


16.9 


16.4 


16.2 


17.1 




16.1 


16.5 


16.5 


16.6 


17.2 




16.4 


16.4 


16.6 


16.5 


16.6 




16.5 


16.2 


16.1 


16.4 


16.8 


x — 






















s 2 = 























12.6.3 Hypothesis 

Exercise 12.6.1 

State the hypotheses. 

H : 
H a : 



Table 12.4 



12.6.4 Data Entry 

Enter the data into your calculator or computer. 

Exercise 12.6.2 

degrees of freedom - numerator: df (n) = 

Exercise 12.6.3 

degrees of freedom - denominator: df (d) = 

Exercise 12.6.4 

F test statistic = 

Exercise 12.6.5 

p-value = 



(Solution on p. 520.) 



(Solution on p. 520.) 



(Solution on p. 520.) 



(Solution on p. 520.) 



7 This content is available online at <http://cnx.Org/content/ml7067/l.8/>. 
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12.6.5 Decisions and Conclusions 

State the decisions and conclusions (in complete sentences) for the following preconceived levels of a . 

Exercise 12.6.6 

a = 0.05 

Decision: 

Conclusion: 

Exercise 12.6.7 

a = 0.01 

Decision: 
Conclusion: 
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12.7 Homework 8 

DIRECTIONS: Use a solution sheet to conduct the following hypothesis tests. The solution sheet 
can be found in the Table of Contents 14. Appendix. 

Exercise 12.7.1 (Solution on p. 520.) 

Three students, Linda, Tuan, and Javier, are given 5 laboratory rats each for a nutritional experi- 
ment. Each rat's weight is recorded in grams. Linda feeds her rats Formula A, Tuan feeds his rats 
Formula B, and Javier feeds his rats Formula C. At the end of a specified time period, each rat is 
weighed again and the net gain in grams is recorded. Using a significance level of 10%, test the 
hypothesis that the three formulas produce the same average weight gain. 

Weights of Student Lab Rats 



Linda's rats 


Tuan's rats 


Javier's rats 


43.5 


47.0 


51.2 


39.4 


40.5 


40.9 


41.3 


38.9 


37.9 


46.0 


46.3 


45.0 


38.2 


44.2 


48.6 



Table 12.5 

Exercise 12.7.2 

A grassroots group opposed to a proposed increase in the gas tax claimed that the increase 
would hurt working-class people the most, since they commute the farthest to work. Suppose 
that the group randomly surveyed 24 individuals and asked them their daily one-way commut- 
ing mileage. The results are below: 



working-class 


professional (middle incomes) 


professional (wealthy) 


17.8 


16.5 


8.5 


26.7 


17.4 


6.3 


49.4 


22.0 


4.6 


9.4 


7.4 


12.6 


65.4 


9.4 


11.0 


47.1 


2.1 


28.6 


19.5 


6.4 


15.4 


51.2 


13.9 


9.3 



Table 12.6 



Exercise 12.7.3 (Solution on p. 520.) 

Refer to Exercise 13.8.1. Determine whether or not the variance in weight gain is statistically the 
same among Javier's and Linda's rats. 



8 This content is available online at <http://cnx.Org/content/ml7063/l.9/>. 
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Exercise 12.7.4 

Refer to Exercise 13.8.2 above (Exercise 12.7.2). Determine whether or not the variance in mileage 
driven is statistically the same among the working class and professional (middle income) groups. 

For the next two problems, refer to the data from Terri Vogel's Log Book [link pending]. 

Exercise 12.7.5 (Solution on p. 520.) 

Examine the 7 practice laps. Determine whether the average lap time is statistically the same for 
the 7 practice laps, or if there is at least one lap that has a different average time from the others. 

Exercise 12.7.6 

Examine practice laps 3 and 4. Determine whether or not the variance in lap time is statistically 
the same for those practice laps. 

For the next four problems, refer to the following data. 

The following table lists the number of pages in four different types of magazines. 



home decorating 


news 


health 


computer 


172 


87 


82 


104 


286 


94 


153 


136 


163 


123 


87 


98 


205 


106 


103 


207 


197 


101 


96 


146 



Table 12.7 



Exercise 12.7.7 (Solution on p. 520.) 

Using a significance level of 5%, test the hypothesis that the four magazine types have the same 
average length. 

Exercise 12.7.8 

Eliminate one magazine type that you now feel has an average length different than the others. 
Redo the hypothesis test, testing that the remaining three averages are statistically the same. Use a 
new solution sheet. Based on this test, are the average lengths for the remaining three magazines 
statistically the same? 

Exercise 12.7.9 

Which two magazine types do you think have the same variance in length? 

Exercise 12.7.10 

Which two magazine types do you think have different variances in length? 
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12.8 Review 9 

The next two questions refer to the following situation: 

Suppose that the probability of a drought in any independent year is 20%. Out of those years in which a 
drought occurs, the probability of water rationing is 10%. However, in any year, the probability of water 
rationing is 5%. 

Exercise 12.8.1 (Solution on p. 521.) 

What is the probability of both a drought and water rationing occurring? 

Exercise 12.8.2 (Solution on p. 521.) 

Out of the years with water rationing, find the probability that there is a drought. 

The next three questions refer to the following survey: 

Favorite Type of Pie by Gender 





apple 


pumpkin 


pecan 


female 


40 


10 


30 


male 


20 


30 


10 



Table 12.8 

Exercise 12.8.3 (Solution on p. 521.) 

Suppose that one individual is randomly chosen. Find the probability that the person's favorite 
pie is apple or the person is male. 

Exercise 12.8.4 (Solution on p. 521.) 

Suppose that one male is randomly chosen. Find the probability his favorite pie is pecan. 

Exercise 12.8.5 (Solution on p. 521.) 

Conduct a hypothesis test to determine if favorite pie type and gender are independent. 

The next two questions refer to the following situation: 

Let's say that the probability that an adult watches the news at least once per week is 0.60. 

Exercise 12.8.6 (Solution on p. 521.) 

We randomly survey 14 people. On average, how many people do we expect to watch the news 
at least once per week? 

Exercise 12.8.7 (Solution on p. 521.) 

We randomly survey 14 people. Of interest is the number that watch the news at least once per 
week. State the distribution of X. X ~ 

Exercise 12.8.8 (Solution on p. 521.) 

The following histogram is most likely to be a result of sampling from which distribution? 



9 This content is available online at <http://cnx.org/content/ml7070/!. 8/>. 
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Figure 12.3 



A. Chi-Square 

B. Geometric 

C. Uniform 

D. Binomial 

Exercise 12.8.9 

The ages of De Anza evening students is known to be normally distributed. A sample of 6 De 
Anza evening students reported their ages (in years) as: 28; 35; 47; 45; 30; 50. Find the probability 
that the average of 6 ages of randomly chosen students is less than 35 years. 

The next three questions refer to the following situation: 

The amount of money a customer spends in one trip to the supermarket is known to have an exponential 
distribution. Suppose the average amount of money a customer spends in one trip to the supermarket is 

$72. 

Exercise 12.8.10 (Solution on p. 521.) 

Find the probability that one customer spends less than $72 in one trip to the supermarket? 

Exercise 12.8.11 (Solution on p. 521.) 

Suppose 5 customers pool their money. (They are poor college students.) How much money 
altogether would you expect the 5 customers to spend in one trip to the supermarket (in dollars)? 

Exercise 12.8.12 (Solution on p. 521.) 

State the distribution to use is if you want to find the probability that the average amount spent 
by 5 customers in one trip to the supermarket is less than $60. 

Exercise 12.8.13 (Solution on p. 521.) 

A math exam was given to all the fifth grade children attending Country School. Two random 
samples of scores were taken. The null hypothesis is that the average math scores for boys and 
girls in fifth grade are the same. Conduct a hypothesis test. 





n 


X 


s 2 


Boys 


55 


82 


29 


Girls 


60 


86 


46 



Table 12.9 
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Exercise 12.8.14 (Solution on p. 521.) 

In a survey of 80 males, 55 had played an organized sport growing up. Of the 70 females surveyed, 
25 had played an organized sport growing up. We are interested in whether the proportion for 
males is higher than the proportion for females. Conduct a hypothesis test. 

Exercise 12.8.15 (Solution on p. 521.) 

Which of the following is preferable when designing a hypothesis test? 

A. Maximize a and minimize /3 

B. Minimize a and maximize /3 

C. Maximize a and f> 

D. Minimize a and f> 

The next three questions refer to the following situation: 

120 people were surveyed as to their favorite beverage (non-alcoholic). The results are below. 

Preferred Beverage by Age 





0-9 


10-19 


20-29 


30 + 


Totals 




Milk 


14 


10 


6 





30 


Soda 


3 


8 


26 


15 


52 


Juice 


7 


12 


12 


7 


38 


Totals 


24 


30 


44 


22 


120 



Table 12.10 



Exercise 12.8.16 

Are the events of milk and 30+: 

a. Independent events? Justify your answer. 

b. Mutually exclusive events? Justify your answer. 



(Solution on p. 521.) 



Exercise 12.8.17 (Solution on p. 521.) 

Suppose that one person is randomly chosen. Find the probability that person is 10 - 19 given 
that he/she prefers juice. 
Exercise 12.8.18 (Solution on p. 521.) 

Are Preferred Beverage and Age independent events? Conduct a hypothesis test. 

Exercise 12.8.19 (Solution on p. 521.) 

Given the following histogram, which distribution is the data most likely to come from? 
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Figure 12.4 



A. uniform 

B. exponential 

C. normal 

D. chi-square 
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12.9 Lab: ANOVA 10 

Class Time: 
Names: 

12.9.1 Student Learning Outcome: 

• The student will conduct a simple ANOVA test involving three variables. 

12.9.2 Collect the Data 

1. Record the price per pound of 8 fruits, 8 vegetables, and 8 breads in your local supermar- 
ket. 



Fruits 


Vegetables 


Breads 



















































Table 12.11 

2. Explain how you could try to collect the data randomly. 



12.9.3 Analyze the Data and Conduct a Hypothesis Test 

1. Compute the following: 
a. Fruit: 

i. x = 

ii. s x = 

iii. n = 
a. Vegetables: 

i. x = 

ii. s x = 

iii. n- 
a. Bread: 

i. x = 

ii. s x = 

iii. n = 



"This content is available online at <http://cnx.org/content/ml7061/1.8/>. 
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2. Find the following: 

a. df {num.) = 

b. df (denom) = 

3. State the approximate distribution for the test. 

4. Test statistic: F = 

5. Sketch a graph of this situation. CLEARLY, label and scale the horizontal axis and shade the region(s) 
corresponding to the p-value. 

6. p-value = 

7. Test at a = 0.05. State your decision and conclusion. 

8. a. Decision: Why did you make this decision? 

b. Conclusion (write a complete sentence). 

c. Based on the results of your study, is there a need to further investigate any of the food groups' 

prices? Why or why not? 
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Solutions to Exercises in Chapter 12 

Solution to Example 12.2, Problem 2 (p. 508) 

• F = 0.9496 

• p - value = 0.4401 

The heights of the bean plants are the same. 

Solutions to Practice: ANOVA 

Solution to Exercise 12.6.2 (p. 510) 

<*/(!) =4 

Solution to Exercise 12.6.3 (p. 510) 
if (2) = 15 

Solution to Exercise 12.6.4 (p. 510) 
Test statistic = F = 4.22 
Solution to Exercise 12.6.5 (p. 510) 
0.017 

Solutions to Homework 
Solution to Exercise 12.7.1 (p. 512) 

a. H : u h = ]i T = uj 
c. df (n) =2;df (d) = 12 

e. 0.67 

f. 0.5305 

h. Decision: Do not reject null; Conclusion: Means are same 

Solution to Exercise 12.7.3 (p. 512) 

c. df(n) = 4> df (d) =4 

e. 3.00 

f. 2 (0.1563) = 0.3126 

h. Decision: Do not reject null; Conclusion: Variances are same 

Solution to Exercise 12.7.5 (p. 513) 

c. df(n) = 6;df(d) =98 

e. 1.69 

f. 0.1319 

h. Decision: Do not reject null; Conclusion: Average lap times are the same 

Solution to Exercise 12.7.7 (p. 513) 

a. H : u d = u„ = Uh = }i c 

b. At least one average is different 

c. df (n) =3;df (d) = 16 

e. 8.69 

f. 0.0012 

h. Decision: Reject null; Conclusion: At least one average is different 
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Solutions to Review 

Solution to Exercise 12.8.1 (p. 514) 
0.02 

Solution to Exercise 12.8.2 (p. 514) 
0.40 
Solution to Exercise 12.8.3 (p. 514) 

100 
140 

Solution to Exercise 12.8.4 (p. 514) 

10 
60 

Solution to Exercise 12.8.5 (p. 514) 
p-value = 0; Reject null; Conclude dependent events 
Solution to Exercise 12.8.6 (p. 514) 
8.4 

Solution to Exercise 12.8.7 (p. 514) 
5(14,0.60) 

Solution to Exercise 12.8.8 (p. 514) 
D 

Solution to Exercise 12.8.10 (p. 515) 
0.6321 

Solution to Exercise 12.8.11 (p. 515) 
$360 
Solution to Exercise 12.8.12 (p. 515) 

N (72, » 

Solution to Exercise 12.8.13 (p. 515) 

p-value = 0.0006; Reject null; Conclude averages are not equal 
Solution to Exercise 12.8.14 (p. 516) 

p-value = 0; Reject null; Conclude proportion of males is higher 
Solution to Exercise 12.8.15 (p. 516) 
D 
Solution to Exercise 12.8.16 (p. 516) 

a. No 

b. Yes,P(M and 30+) = 

Solution to Exercise 12.8.17 (p. 516) 

12 
38 

Solution to Exercise 12.8.18 (p. 516) 

No; p-value = 

Solution to Exercise 12.8.19 (p. 516) 

A 
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Appendix 

13.1 Practice Final Exam l 1 

Questions 1-2 refer to the following: 

An experiment consists of tossing two 12-sided dice (the numbers 1-12 are printed on the sides of each 
dice). 

• Let Event A = both dice show an even number 

• Let Event B = both dice show a number more than 8 

Exercise 13.1.1 (Solution on p. 577.) 

Events A and B are: 

A. Mutually exclusive. 

B. Independent. 

C. Mutually exclusive and independent. 

D. Neither mutually exclusive nor independent. 

Exercise 13.1.2 (Solution on p. 577.) 

Find P(A\B) 

A. 4 

R 16 
B - 144 

C ± 
*" 16 

*-• 144 

Exercise 13.1.3 (Solution on p. 577.) 

Which of the following are TRUE when we perform a hypothesis test on matched or paired sam- 
ples? 

A. Sample sizes are almost never small. 

B. Two measurements are drawn from the same pair of individuals or objects. 

C. Two sample averages are compared to each other. 

D. Answer choices B and C are both true. 

Questions 4-5 refer to the following: 

118 students were asked what type of color their bedrooms were painted: light colors, dark colors or vibrant 
colors. The results were tabulated according to gender. 



lr rhis content is available online at <http://cnx.Org/content/ml6304/l.16/>. 
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Light colors 


Dark colors 


Vibrant colors 


Female 


20 


22 


28 


Male 


10 


30 


8 



Table 13.1 

Exercise 13.1.4 (Solution on p. 577.) 

Find the probability that a randomly chosen student is male or has a bedroom painted with light 
colors. 

A i°- 

R _68_ 
D. llg 

r il 

*— 118 
u ' - 48 

Exercise 13.1.5 (Solution on p. 577.) 

Find the probability that a randomly chosen student is male given the student's bedroom is 
painted with dark colors. 



A. 
B. 
C. 
D. 



30 
118 
30 
48 
22 
118 
30 
52 



Questions 6-7 refer to the following: 

We are interested in the number of times a teenager must be reminded to do his/her chores each week. A 
survey of 40 mothers was conducted. The table below shows the results of the survey. 



X 


P(x) 





2 
40 


1 


5 
40 


2 




3 


14 
40 


4 


7 
40 


5 


4 
40 



Table 13.2 



Exercise 13.1.6 

Find the probability that a teenager is reminded 2 times. 



(Solution on p. 577.) 



A. 8 

R 8 
B - 40 

D. 2 



Exercise 13.1.7 



(Solution on p. 577.) 



Find the expected number of times a teenager is reminded to do his/her chores. 



APPENDIX 


A. 


15 


B. 


2.78 


C. 


1.0 


D. 


3.13 
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Questions 8-9 refer to the following: 

On any given day, approximately 37.5% of the cars parked in the De Anza parking structure are parked 
crookedly. (Survey done by Kathy Plum.) We randomly survey 22 cars. We are interested in the number of 
cars that are parked crookedly. 

Exercise 13.1.8 (Solution on p. 577.) 

For every 22 cars, how many would you expect to be parked crookedly, on average? 

A. 8.25 

B. 11 

C. 18 

D. 7.5 

Exercise 13.1.9 (Solution on p. 577.) 

What is the probability that at least 10 of the 22 cars are parked crookedly. 

A. 0.1263 

B. 0.1607 

C. 0.2870 

D. 0.8393 

Exercise 13.1.10 (Solution on p. 577.) 

Using a sample of 15 Stanford-Binet IQ scores, we wish to conduct a hypothesis test. Our claim 
is that the average IQ score on the Stanford-Binet IQ test is more than 100. It is known that the 
standard deviation of all Stanford-Binet IQ scores is 15 points. The correct distribution to use for 
the hypothesis test is: 

A. Binomial 

B. Student-t 

C. Normal 

D. Uniform 

Questions 11 - 13 refer to the following: 

De Anza College keeps statistics on the pass rate of students who enroll in math classes. In a sample of 1795 
students enrolled in Math 1A (1st quarter calculus), 1428 passed the course. In a sample of 856 students 
enrolled in Math IB (2nd quarter calculus), 662 passed. In general, are the pass rates of Math 1A and Math 
IB statistically the same? Let A = the subscript for Math 1A and B = the subscript for Math IB. 

Exercise 13.1.11 (Solution on p. 577.) 

If you were to conduct an appropriate hypothesis test, the alternate hypothesis would be: 

A. H a : p A = p B 

B. H a : p A > p B 

C. H : p A = p B 

D. H a : p A / p B 

Exercise 13.1.12 (Solution on p. 577.) 

The Type I error is to: 
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A. believe that the pass rate for Math 1A is the same as the pass rate for Math IB when, in fact, 

the pass rates are different. 

B. believe that the pass rate for Math 1A is different than the pass rate for Math IB when, in fact, 

the pass rates are the same. 

C. believe that the pass rate for Math 1A is greater than the pass rate for Math IB when, in fact, 

the pass rate for Math 1A is less than the pass rate for Math IB. 

D. believe that the pass rate for Math 1A is the same as the pass rate for Math IB when, in fact, 

they are the same. 



Exercise 13.1.13 

The correct decision is to: 



(Solution on p. 577.) 



A. reject H 

B. not reject H 

C. not make a decision because of lack of information 

Kia, Alejandra, and Iris are runners on the track teams at three different schools. Their running times, in 
minutes, and the statistics for the track teams at their respective schools, for a one mile run, are given in the 
table below: 





Running Time 


School Average Running Time 


School Standard Deviation 


Kia 


4.9 


5.2 


.15 


Alejandra 


4.2 


4.6 


.25 


Iris 


4.5 


4.9 


.12 



Table 13.3 

Exercise 13.1.14 (Solution on p. 577.) 

Which student is the BEST when compared to the other runners at her school? 

A. Kia 

B. Alejandra 

C. Iris 

D. Impossible to determine 

Questions 15 - 16 refer to the following: 

The following adult ski sweater prices are from the Gorsuch Ltd. Winter catalog: 

{$212, $292, $278, $199$280, $236} 

Assume the underlying sweater price population is approximately normal. The null hypothesis is that the 
average price of adult ski sweaters from Gorsuch Ltd. is at least $275. 

Exercise 13.1.15 (Solution on p. 577.) 

The correct distribution to use for the hypothesis test is: 

A. Normal 

B. Binomial 

C. Student-t 

D. Exponential 
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Exercise 13.1.16 (Solution on p. 577.) 

The hypothesis test: 

A. is two-tailed 

B. is left-tailed 

C. is right-tailed 

D. has no tails 

Exercise 13.1.17 (Solution on p. 577.) 

Sara, a statistics student, wanted to determine the average number of books that college professors 
have in their office. She randomly selected 2 buildings on campus and asked each professor in the 
selected buildings how many books are in his/her office. Sara surveyed 25 professors. The type 
of sampling selected is a: 

A. simple random sampling 

B. systematic sampling 

C. cluster sampling 

D. stratified sampling 

Exercise 13.1.18 (Solution on p. 577.) 

A clothing store would use which measure of the center of data when placing orders? 

A. Mean 

B. Median 

C. Mode 

D. IQR 



Exercise 13.1.19 

In a hypothesis test, the p-value is 



(Solution on p. 577.) 



A. the probability that an outcome of the data will happen purely by chance when the null hy- 

pothesis is true. 

B. called the preconceived alpha. 

C. compared to beta to decide whether to reject or not reject the null hypothesis. 

D. Answer choices A and B are both true. 

Questions 20 - 22 refer to the following: 

A community college offers classes 6 days a week: Monday through Saturday. Maria conducted a study 
of the students in her classes to determine how many days per week the students who are in her classes 
come to campus for classes. In each of her 5 classes she randomly selected 10 students and asked them 
how many days they come to campus for classes. The results of her survey are summarized in the table 
below. 



Number of Days on Campus 


Frequency 


Relative Frequency 


Cumulative Relative Frequency 


1 


2 






2 


12 


.24 




3 


10 


.20 




4 






.98 


5 









6 


1 


.02 


1.00 
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(Solution on p. 577.) 



Table 13.4 

Exercise 13.1.20 (Solution on p. 577.) 

Combined with convenience sampling, what other sampling technique did Maria use? 

A. simple random 

B. systematic 

C. cluster 

D. stratified 

Exercise 13.1.21 

How many students come to campus for classes 4 days a week? 

A. 49 

B. 25 

C. 30 

D. 13 

Exercise 13.1.22 

What is the 60th percentile for the this data? 

A. 2 

B. 3 

C. 4 

D. 5 



(Solution on p. 577.) 



The next two questions refer to the following: 

The following data are the results of a random survey of 110 Reservists called to active duty to increase 
security at California airports. 



Number of Dependents 


Frequency 





11 


1 


27 


2 


33 


3 


20 


4 


19 



Table 13.5 

Exercise 13.1.23 (Solution on p. 577.) 

Construct a 95% Confidence Interval for the true population average number of dependents of 
Reservists called to active duty to increase security at California airports. 

A. (1.85,2.32) 

B. (1.80,2.36) 

C. (1.97,2.46) 

D. (1.92,2.50) 

Exercise 13.1.24 

The 95% confidence Interval above means: 



(Solution on p. 577.) 
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A. 5% of Confidence Intervals constructed this way will not contain the true population aveage 

number of dependents. 

B. We are 95% confident the true population average number of dependents falls in the interval. 

C. Both of the above answer choices are correct. 

D. None of the above. 

Exercise 13.1.25 (Solution on p. 578.) 

X ~LT (4, 10) . Find the 30th percentile. 

A. 0.3000 

B. 3 

C. 5.8 

D. 6.1 

Exercise 13.1.26 (Solution on p. 578.) 

It X ~Exp (0.8), then P (X < y) = 

A. 0.3679 

B. 0.4727 

C. 0.6321 

D. cannot be determined 

Exercise 13.1.27 (Solution on p. 578.) 

The lifetime of a computer circuit board is normally distributed with a mean of 2500 hours and a 
standard deviation of 60 hours. What is the probability that a randomly chosen board will last at 
most 2560 hours? 

A. 0.8413 

B. 0.1587 

C. 0.3461 

D. 0.6539 

Exercise 13.1.28 (Solution on p. 578.) 

A survey of 123 Reservists called to active duty as a result of the September 11, 2001, attacks 
was conducted to determine the proportion that were married. Eighty-six reported being married. 
Construct a 98% confidence interval for the true population proportion of reservists called to active 
duty that are married. 

A. (0.6030,0.7954) 

B. (0.6181,0.7802) 

C. (0.5927,0.8057) 

D. (0.6312,0.7672) 

Exercise 13.1.29 (Solution on p. 578.) 

Winning times in 26 mile marathons run by world class runners average 145 minutes with a stan- 
dard deviation of 14 minutes. A sample of the last 10 marathon winning times is collected. 

Let x = average winning times for 10 marathons. 

The distribution for x is: 

A. N (l45,jL 

B. N (145, 14) 
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C. t 9 

D. t w 

Exercise 13.1.30 (Solution on p. 578.) 

Suppose that Phi Beta Kappa honors the top 1% of college and university seniors. Assume that 
grade point averages (G.P.A.) at a certain college are normally distributed with a 2.5 average and 
a standard deviation of 0.5. What would be the minimum G.P.A. needed to become a member of 
Phi Beta Kappa at that college? 

A. 3.99 

B. 1.34 

C. 3.00 

D. 3.66 

The number of people living on American farms has declined steadily during this century. Here are data 
on the farm population (in millions of persons) from 1935 to 1980. 



Year 


1935 


1940 


1945 


1950 


1955 


1960 


1965 


1970 


1975 


1980 


Population 


32.1 


30.5 


24.4 


23.0 


19.1 


15.6 


12.4 


9.7 


8.9 


7.2 



Table 13.6 

The linear regression equation is y-hat = 1166.93 - 0.5868x 

Exercise 13.1.31 

What was the expected farm population (in millions of persons) for 1980? 

A. 7.2 

B. 5.1 

C. 6.0 

D. 8.0 



(Solution on p. 578.) 



Exercise 13.1.32 

In linear regression, which is the best possible SSE? 

A. 13.46 

B. 18.22 

C. 24.05 

D. 16.33 



(Solution on p. 578.) 



Exercise 13.1.33 (Solution on p. 578.) 

In regression analysis, if the correlation coefficient is close to 1 what can be said about the best fit 
line? 

A. It is a horizontal line. Therefore, we can not use it. 

B. There is a strong linear pattern. Therefore, it is most likely a good model to be used. 

C. The coefficient correlation is close to the limit. Therefore, it is hard to make a decision. 

D. We do not have the equation. Therefore, we can not say anything about it. 

Question 34-36 refer to the following: 

A study of the career plans of young women and men sent questionnaires to all 722 members of the senior 
class in the College of Business Administration at the University of Illinois. One question asked which 
major within the business program the student had chosen. Here are the data from the students who 
responded. 
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Female 


Male 


Accounting 


68 


56 


Administration 


91 


40 


Ecomonics 


5 


6 


Finance 


61 


59 



Table 13.7: Does the data suggest that there is a relationship between the gender of students and their 

choice of major? 



Exercise 13.1.34 

The distribution for the test is: 



(Solution on p. 578.) 



A. Chi 2 s 

B. Chi 2 3 

C. £722 

D. N(0,1) 

Exercise 13.1.35 

The expected number of female who choose Finance is : 

A. 37 

B. 61 

C. 60 

D. 70 

Exercise 13.1.36 

The p-value is 0.0127. The conclusion to the test is: 



(Solution on p. 578.) 



(Solution on p. 578.) 



A. The choice of major and the gender of the student are independent of each other. 

B. The choice of major and the gender of the student are not independent of each other. 

C. Students find Economics very hard. 

D. More females prefer Administration than males. 

Exercise 13.1.37 (Solution on p. 578.) 

An agency reported that the work force nationwide is composed of 10% professional, 10% clerical, 
30% skilled, 15% service, and 35% semiskilled laborers. A random sample of 100 San Jose residents 
indicated 15 professional, 15 clerical, 40 skilled, 10 service, and 20 semiskilled laborers. At a = .10 
does the work force in San Jose appear to be consistent with the agency report for the nation? 
Which kind of test is it? 



A. Chi goodness of fit 

B. Chi test of independence 

C. Independent groups proportions 

D. Unable to determine 
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13.2 Practice Final Exam 2 2 

Exercise 13.2.1 (Solution on p. 578.) 

A study was done to determine the proportion of teenagers that own a car. The true proportion 
of teenagers that own a car is the: 

A. statistic 

B. parameter 

C. population 

D. variable 

The next two questions refer to the following data: 



value 


frequency 





1 


1 


4 


2 


7 


3 


9 


6 


4 



Table 13.8 



Exercise 13.2.2 

The box plot for the data is: 



(Solution on p. 578.) 







2 3 



6 





2 This content is available online at <http://cnx.Org/content/ml6303/l.15/>. 
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D. 

Exercise 13.2.3 (Solution on p. 578.) 

If 6 were added to each value of the data in the table, the 15th percentile of the new list of values 

is: 

A. 6 

B. 1 

C. 7 

D. 8 

The next two questions refer to the following situation: 

Suppose that the probability of a drought in any independent year is 20%. Out of those years in which a 
drought occurs, the probability of water rationing is 10%. However, in any year, the probability of water 
rationing is 5%. 

Exercise 13.2.4 (Solution on p. 578.) 

What is the probability of both a drought and water rationing occurring? 

A. 0.05 

B. 0.01 

C. 0.02 

D. 0.30 



Exercise 13.2.5 

Which of the following is true? 

A. drought and water rationing are independent events 

B. drought and water rationing are mutually exclusive events 

C. none of the above 

The next two questions refer to the following situation: 

Suppose that a survey yielded the following data: 

Favorite Pie Type 



(Solution on p. 578.) 



gender 


apple 


pumpkin 


pecan 


female 


40 


10 


30 


male 


20 


30 


10 



Table 13.9 

Exercise 13.2.6 (Solution on p. 578.) 

Suppose that one individual is randomly chosen. The probability that the person's favorite pie is 
apple or the person is male is: 
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A 


40 


/\. 


M) 


B. 


60 
140 


C. 


120 
140 


n 


100 


VJ . 


140 



Exercise 13.2.7 (Solution on p. 578.) 

Suppose H is: Favorite pie type and gender are independent. 

The p-value is: 

A.«0 

B. 1 

C. 0.05 

D. cannot be determined 

The next two questions refer to the following situation: 

Let's say that the probability that an adult watches the news at least once per week is 0.60. We randomly 
survey 14 people. Of interest is the number that watch the news at least once per week. 

Exercise 13.2.8 (Solution on p. 578.) 

Which of the following statements is FALSE? 

A. X- B (14,0.60) 

B. The values for x are: {1,2,3,..., 14} 
G ]i = 8.4 

D. P (X = 5) = 0.0408 

Exercise 13.2.9 (Solution on p. 578.) 

Find the probability that at least 6 adults watch the news. 

A A 

A ' 14 

B. 0.8499 

C. 0.9417 

D. 0.6429 

Exercise 13.2.10 (Solution on p. 578.) 

The following histogram is most likely to be a result of sampling from which distribution? 
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A. Chi-Square 

B. Exponential 

C. Uniform 

D. Binomial 

The ages of campus day and evening students is known to be normally distributed. A sample of 6 campus 
day and evening students reported their ages (in years) as: {18, 35, 27, 45, 20, 20} 

Exercise 13.2.11 (Solution on p. 578.) 

What is the error bound for the 90% confidence interval of the true average age? 

A. 11.2 

B. 22.3 

C. 17.5 

D. 8.7 

Exercise 13.2.12 (Solution on p. 579.) 

If a normally distributed random variable has }i — and a = 1 , then 97.5% of the population 
values lie above: 

A. -1.96 

B. 1.96 

C. 1 

D. -1 

The next three questions refer to the following situation: 

The amount of money a customer spends in one trip to the supermarket is known to have an exponential 
distribution. Suppose the average amount of money a customer spends in one trip to the supermarket is 

$72. 

Exercise 13.2.13 (Solution on p. 579.) 

What is the probability that one customer spends less than $72 in one trip to the supermarket? 



A. 0.6321 
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B. 0.5000 

C. 0.3714 

D. 1 

Exercise 13.2.14 (Solution on p. 579.) 

How much money altogether would you expect next 5 customers to spend in one trip to the 
supermarket (in dollars)? 



A. 72 

R 72 2 
B - T- 

C. 5184 

D. 360 

Exercise 13.2.15 (Solution on p. 579.) 

If you want to find the probability that the average of 5 customers is less than $60, the distribution 
to use is: 



A. N(72,72) 
B.N(72,£) 

C. Exp (72) 

D. Exp ( j2 



The next three questions refer to the following situation: 

The amount of time it takes a fourth grader to carry out the trash is uniformly distributed in the interval 
from 1 to 10 minutes. 

Exercise 13.2.16 (Solution on p. 579.) 

What is the probability that a randomly chosen fourth grader takes more than 7 minutes to take 
out the trash? 



A 


3 


rY* 


9 


B. 


7 
9 


C. 


3 
10 


n 


7 


VJ . 


10 



Exercise 13.2.17 (Solution on p. 579.) 

Which graph best shows the probability that a randomly chosen fourth grader takes more than 6 
minutes to take out the trash given that he/she has already taken more than 3 minutes? 
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f(x) 



1/7" 




1 6 10 



b) 



f(x) 



1/9 - 




1 6 10 



c] 



f(x) 



1/9" 




3 



6 



10 



d) 



f(x) 



1/7" 



3 




10 



Exercise 13.2.18 (Solution on p. 579.) 

We should expect a fourth grader to take how many minutes to take out the trash? 

A. 4.5 

B. 5.5 

C. 5 

D. 10 

The next three questions refer to the following situation: 

At the beginning of the quarter, the amount of time a student waits in line at the campus cafeteria is nor- 
mally distributed with a mean of 5 minutes and a standard deviation of 1.5 minutes. 

Exercise 13.2.19 (Solution on p. 579.) 

What is the 90th percentile of waiting times (in minutes)? 

A. 1.28 

B. 90 

C. 7.47 

D. 6.92 



Exercise 13.2.20 

The median waiting time (in minutes) for one student is: 

A. 5 

B. 50 

C. 2.5 

D. 1.5 



(Solution on p. 579.) 



Exercise 13.2.21 (Solution on p. 579.) 

Find the probability that the average wait time of 10 students is at most 5.5 minutes. 

A. 0.6301 

B. 0.8541 

C. 0.3694 
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D. 0.1459 

Exercise 13.2.22 (Solution on p. 579.) 

A sample of 80 software engineers in Silicon Valley is taken and it is found that 20% of them earn 
approximately $50,000 per year. A point estimate for the true proportion of engineers in Silicon 
Valley who earn $50,000 per year is: 

A. 16 

B. 0.2 

C. 1 

D. 0.95 



Exercise 13.2.23 

If P (Z < z«) = 0. 1587 where Z~N (0, 1) , then ex. is equal to: 

A. -1 

B. 0.1587 

C. 0.8413 

D. 1 



(Solution on p. 579.) 



Exercise 13.2.24 (Solution on p. 579.) 

A professor tested 35 students to determine their entering skills. At the end of the term, after 
completing the course, the same test was administered to the same 35 students to study their 
improvement. This would be a test of: 

A. independent groups 

B. 2 proportions 

C. dependent groups 

D. exclusive groups 

Exercise 13.2.25 (Solution on p. 579.) 

A math exam was given to all the third grade children attending ABC School. Two random 
samples of scores were taken. 





n 


X 


s 


Boys 


55 


82 


5 


Girls 


60 


86 


7 



Table 13.10 

Which of the following correctly describes the results of a hypothesis test of the claim, "There is 
a difference between the mean scores obtained by third grade girls and boys at the 5 % level of 
significance"? 

A. Do not reject H . There is no difference in the mean scores. 

B. Do not reject H . There is a difference in the mean scores. 

C. Reject H . There is no difference in the mean scores. 

D. Reject H . There is a difference in the mean scores. 

Exercise 13.2.26 (Solution on p. 579.) 

In a survey of 80 males, 45 had played an organized sport growing up. Of the 70 females surveyed, 
25 had played an organized sport growing up. We are interested in whether the proportion for 
males is higher than the proportion for females. The correct conclusion is: 
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A. The proportion for males is the same as the proportion for females. 

B. The proportion for males is not the same as the proportion for females. 

C. The proportion for males is higher than the proportion for females. 

D. Not enough information to determine. 

Exercise 13.2.27 (Solution on p. 579.) 

Note: Chi-Square Test of a Single Variance; Not all classes cover this topic. From past experience, 
a statistics teacher has found that the average score on a midterm is 81 with a standard deviation 
of 5.2. This term, a class of 49 students had a standard deviation of 5 on the midterm. Do the data 
indicate that we should reject the teacher's claim that the standard deviation is 5.2? Use a = 0.05. 

A. Yes 

B. No 

C. Not enough information given to solve the problem 

Exercise 13.2.28 (Solution on p. 579.) 

Note: F Distribution Test of ANOVA; Not all classes cover this topic. Three loading machines 
are being compared. Ten samples were taken for each machine. Machine I took an average of 31 
minutes to load packages with a standard deviation of 2 minutes. Machine II took an average of 28 
minutes to load packages with a standard deviation of 1.5 minutes. Machine III took an average of 
29 minutes to load packages with a standard deviation of 1 minute. Find the p-value when testing 
that the average loading times are the same. 

A. the p-value is close to 

B. p-value is close to 1 

C. Not enough information given to solve the problem 

The next three questions refer to the following situation: 

A corporation has offices in different parts of the country. It has gathered the following information con- 
cerning the number of bathrooms and the number of employees at seven sites: 



Number of employees x 


650 


730 


810 


900 


102 


107 


1150 


Number of bathrooms y 


40 


50 


54 


61 


82 


110 


121 



Table 13.11 

Exercise 13.2.29 (Solution on p. 579.) 

Is the correlation between the number of employees and the number of bathrooms significant? 

A. Yes 

B. No 

C. Not enough information to answer question 



Exercise 13.2.30 

The linear regression equation is: 

A. y = 0.0094 - 79.96x 

B. y = 79.96 + 0.0094x 

C. y = 79.96 - 0.0094x 

D. y = -0.0094 + 79.96x 



(Solution on p. 579.) 
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Exercise 13.2.31 (Solution on p. 579.) 

If a site has 1150 employees, approximately how many bathrooms should it have? 

A. 69 

B. 91 

C. 91,954 

D. We should not be estimating here. 

Exercise 13.2.32 (Solution on p. 579.) 

Note: Chi-Square Test of a Single Variance; Not all classes cover this topic. Suppose that a sample 
of size 10 was collected, with x = 4.4 and s = 1.4 . 

H : a 2 = 1.6 vs. H a : a 2 7^ 1.6. Which graph best describes the results of the test? 




s. 39 




- 1. 96 



1. 96 




11.03 




-2.23 



Z23 



Exercise 13.2.33 (Solution on p. 579.) 

64 backpackers were asked the number of days their latest backpacking trip was. The number of 
days is given in the table below: 



# of days 


1 


2 


3 


4 


5 


6 


7 


8 


Frequency 


5 


9 


6 


12 


7 


10 


5 


10 



Table 13.12 
Conduct an appropriate test to determine if the distribution is uniform. 

A. The p-value is > 0.10 , the distribution is uniform. 

B. The p-value is < 0.01 , the distribution is uniform. 

C. The p-value is between 0.01 and 0.10, but without a. there is not enough information 

D. There is no such test that can be conducted. 



Exercise 13.2.34 (Solution on p. 579.) 

Note: F Distribution test of ANOVA; Not all classes cover this topic. Which of the following 
statements is true when using one-way ANOVA? 

A. The populations from which the samples are selected have different distributions. 

B. The sample sizes are large. 

C. The test is to determine if the different groups have the same averages. 

D. There is a correlation between the factors of the experiment. 
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13.3 Data Sets 3 

13.3.1 Lap Times 

The following tables provide lap times from Terri Vogel's Log Book. Times are recorded in seconds for 
2.5-mile laps completed in a series of races and practice runs. 

Race Lap Times (in Seconds) 





Lap 1 


Lap 2 


Lap 3 


Lap 4 


Lap 5 


Lap 6 


Lap 7 


Race 1 


135 


130 


131 


132 


130 


131 


133 


Race 2 


134 


131 


131 


129 


128 


128 


129 


Race 3 


129 


128 


127 


127 


130 


127 


129 


Race 4 


125 


125 


126 


125 


124 


125 


125 


Race 5 


133 


132 


132 


132 


131 


130 


132 


Race 6 


130 


130 


130 


129 


129 


130 


129 


Race 7 


132 


131 


133 


131 


134 


134 


131 


Race 8 


127 


128 


127 


130 


128 


126 


128 


Race 9 


132 


130 


127 


128 


126 


127 


124 


Race 10 


135 


131 


131 


132 


130 


131 


130 


Race 11 


132 


131 


132 


131 


130 


129 


129 


Race 12 


134 


130 


130 


130 


131 


130 


130 


Race 13 


128 


127 


128 


128 


128 


129 


128 


Race 14 


132 


131 


131 


131 


132 


130 


130 


Race 15 


136 


129 


129 


129 


129 


129 


129 


Race 16 


129 


129 


129 


128 


128 


129 


129 


Race 17 


134 


131 


132 


131 


132 


132 


132 


Race 18 


129 


129 


130 


130 


133 


133 


127 


Race 19 


130 


129 


129 


129 


129 


129 


128 


Race 20 


131 


128 


130 


128 


129 


130 


130 



Table 13.13 



3 This content is available online at <http://cnx.org/content/ml7132/1.5/>. 
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Practice Lap Times (in Seconds) 





Lap 1 


Lap 2 


Lap 3 


Lap 4 


Lap 5 


Lap 6 


Lap 7 


Practice 1 


142 


143 


180 


137 


134 


134 


172 


Practice 2 


140 


135 


134 


133 


128 


128 


131 


Practice 3 


130 


133 


130 


128 


135 


133 


133 


Practice 4 


141 


136 


137 


136 


136 


136 


145 


Practice 5 


140 


138 


136 


137 


135 


134 


134 


Practice 6 


142 


142 


139 


138 


129 


129 


127 


Practice 7 


139 


137 


135 


135 


137 


134 


135 


Practice 8 


143 


136 


134 


133 


134 


133 


132 


Practice 9 


135 


134 


133 


133 


132 


132 


133 


Practice 10 


131 


130 


128 


129 


127 


128 


127 


Practice 11 


143 


139 


139 


138 


138 


137 


138 


Practice 12 


132 


133 


131 


129 


128 


127 


126 


Practice 13 


149 


144 


144 


139 


138 


138 


137 


Practice 14 


133 


132 


137 


133 


134 


130 


131 


Practice 15 


138 


136 


133 


133 


132 


131 


131 



Table 13.14 



13.3.2 Stock Prices 

The following table lists initial public offering (IPO) stock prices for all 1999 stocks that at least doubled in 
value during the first day of trading. This is historical data. 
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IPO Offer Prices 



$17.00 


$23.00 


$14.00 


$16.00 


$12.00 


$26.00 


$20.00 


$22.00 


$14.00 


$15.00 


$22.00 


$18.00 


$18.00 


$21.00 


$21.00 


$19.00 


$15.00 


$21.00 


$18.00 


$17.00 


$15.00 


$25.00 


$14.00 


$30.00 


$16.00 


$10.00 


$20.00 


$12.00 


$16.00 


$17.44 


$16.00 


$14.00 


$15.00 


$20.00 


$20.00 


$16.00 


$17.00 


$16.00 


$15.00 


$15.00 


$19.00 


$48.00 


$16.00 


$18.00 


$9.00 


$18.00 


$18.00 


$20.00 


$8.00 


$20.00 


$17.00 


$14.00 


$11.00 


$16.00 


$19.00 


$15.00 


$21.00 


$12.00 


$8.00 


$16.00 


$13.00 


$14.00 


$15.00 


$14.00 


$13.41 


$28.00 


$21.00 


$17.00 


$28.00 


$17.00 


$19.00 


$16.00 


$17.00 


$19.00 


$18.00 


$17.00 


$15.00 




$14.00 


$21.00 


$12.00 


$18.00 


$24.00 




$15.00 


$23.00 


$14.00 


$16.00 


$12.00 




$24.00 


$20.00 


$14.00 


$14.00 


$15.00 




$14.00 


$19.00 


$16.00 


$38.00 


$20.00 




$24.00 


$16.00 


$8.00 


$18.00 


$17.00 




$16.00 


$15.00 


$7.00 


$19.00 


$12.00 




$8.00 


$23.00 


$12.00 


$18.00 


$20.00 




$21.00 


$34.00 


$16.00 


$26.00 


$14.00 





Table 13.15 



NOTE: Data compiled by Jay R. Ritter of Univ. of Florida using data from Securities Data Co. and 
Bloomberg. 
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13 A Group Projects 

13.4.1 Group Project: Univariate Data 4 

13.4.1.1 Student Learning Objectives 

• The student will design and carry out a survey. 

• The student will analyze and graphically display the results of the survey. 

13.4.1.2 Instructions 

As you complete each task below, check it off. Answer all questions in your summary. 

Decide what data you are going to study. 

EXAMPLES: Here are two examples, but you may NOT use them: number of M&M's per 
small bag, number of pencils students have in their backpacks. 

Are your data discrete or continuous? How do you know? 

Decide how you are going to collect the data (for instance, buy 30 bags of M&M's; collect data from 

the World Wide Web). 
Describe your sampling technique in detail. Use cluster, stratified, systematic, or simple random 

(using a random number generator) sampling. Do not use convenience sampling. What method did 

you use? Why did you pick that method? 

Conduct your survey. Your data size must be at least 30. 

Summarize your data in a chart with columns showing data value, frequency, relative frequency 

and cumulative relative frequency. 
Answer the following (rounded to 2 decimal places): 

1. x = 

2. s = 

3. First quartile = 

4. Median = 

5. 70th percentile = 

What value is 2 standard deviations above the mean? 

What value is 1.5 standard deviations below the mean? 

Construct a histogram displaying your data. 

In complete sentences, describe the shape of your graph. 

Do you notice any potential outliers? If so, what values are they? Show your work in how you used 

the potential outlier formula in Chapter 2 (since you have univariate data) to determine whether or 

not the values might be outliers. 

Construct a box plot displaying your data. 

Does the middle 50% of the data appear to be concentrated together or spread apart? Explain how 

you determined this. 
Looking at both the histogram and the box plot, discuss the distribution of your data. 



13.4.1.3 Assignment Checklist 

You need to turn in the following typed and stapled packet, with pages in the following order: 

Cover sheet: name, class time, and name of your study 



4 This content is available online at <http://cnx.org/content/ml7142/1.8/>. 
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Summary page: This should contain paragraphs written with complete sentences. It should include 

answers to all the questions above. It should also include statements describing the population under 
study, the sample, a parameter or parameters being studied, and the statistic or statistics produced. 

URL for data, if your data are from the World Wide Web. 

Chart of data, frequency, relative frequency and cumulative relative frequency. 

Page(s) of graphs: histogram and box plot. 
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13.4.2 Group Project: Continuous Distributions and Central Limit Theorem 5 

13.4.2.1 Student Learning Objectives 

• The student will collect a sample of continuous data. 

• The student will attempt to fit the data sample to various distribution models. 

• The student will validate the Central Limit Theorem. 

13.4.2.2 Instructions 

As you complete each task below, check it off. Answer all questions in your summary. 

13.4.2.3 Part I: Sampling 

Decide what continuous data you are going to study. (Here are two examples, but you may NOT use 

them: the amount of money a student spends on college supplies this term or the length of a long 

distance telephone call.) 
Describe your sampling technique in detail. Use cluster, stratified, systematic, or simple random 

(using a random number generator) sampling. Do not use convenience sampling. What method did 

you use? Why did you pick that method? 

Conduct your survey. Gather at least 150 pieces of continuous quantitative data. 

Define (in words) the random variable for your data. X = 



Create 2 lists of your data: (1) unordered data, (2) in order of smallest to largest. 

Find the sample mean and the sample standard deviation (rounded to 2 decimal places). 



1. x = 

2. s = 



Construct a histogram of your data containing 5-10 intervals of equal width. The histogram should 
be a representative display of your data. Label and scale it. 



13.4.2.4 Part II: Possible Distributions 

Suppose that X followed the theoretical distributions below. Set up each distribution using the ap- 
propriate information from your data. 

Uniform: X ~ U Use the lowest and highest values as a and b. 

Exponential: X ~ Exp Use x to estimate }i . 

Normal: X ~ N Use x to estimate for \i and s to estimate for a. 

Must your data fit one of the above distributions? Explain why or why not. 

Could the data fit 2 or 3 of the above distributions (at the same time)? Explain. 

Calculate the value /c(an X value) that is 1.75 standard deviations above the sample mean, k = 

(rounded to 2 decimal places) Note: k = x + (1.75) * s 

Determine the relative frequencies (RF) rounded to 4 decimal places. 

1 nr frequency 



total number surveyed 

2. RF (X < k) = 

3. RF (X>k) = 

4. RF (X = k) = 

Use a separate piece of paper for EACH distribution (uniform, exponential, normal) to respond to the 
following questions. 



5 This content is available online at <http://cnx.org/content/ml7141/1.9/>. 
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NOTE: You should have one page for the uniform, one page for the exponential, and one page for 
the normal 

State the distribution: X ~ 



Draw a graph for each of the three theoretical distributions. Label the axes and mark them appropri- 
ately. 
Find the following theoretical probabilities (rounded to 4 decimal places). 

1. P(X < k ) = 

2. P(X > k ) = 

3. p(X = k ) = 

Compare the relative frequencies to the corresponding probabilities. Are the values close? 

Does it appear that the data fit the distribution well? Justify your answer by comparing the probabil- 
ities to the relative frequencies, and the histograms to the theoretical graphs. 

13.4.2.5 Part III: CLT Experiments 

From your original data (before ordering), use a random number generator to pick 40 samples of 

size 5. For each sample, calculate the average. 
On a separate page, attached to the summary, include the 40 samples of size 5, along with the 40 

sample averages. 

List the 40 averages in order from smallest to largest. 

Define the random variable, X , in words. X = 

State the approximate theoretical distribution of X. X~ 

Base this on the mean and standard deviation from your original data. 

Construct a histogram displaying your data. Use 5 to 6 intervals of equal width. Label and scale it. 



Calculate the value k (an X value) that is 1.75 standard deviations above the sample mean. k- 

(rounded to 2 decimal places) 
Determine the relative frequencies (RF) rounded to 4 decimal places. 

1. RF( X< I ) = 

2. RF(X > k ) = 

3. RF(X = k) = 

Find the following theoretical probabilities (rounded to 4 decimal places). 

v P(X<fc) = 
•. P(X>k) = 
• . ?(X = k) = 

Draw the graph of the theoretical distribution of X. 

Answer the questions below. 

Compare the relative frequencies to the probabilities. Are the values close? 



Does it appear that the data of averages fit the distribution of X well? Justify your answer by 

comparing the probabilities to the relative frequencies, and the histogram to the theoretical graph. 

In 3 - 5 complete sentences for each, answer the following questions. Give thoughtful explanations. 

In summary, do your original data seem to fit the uniform, exponential, or normal distributions? 

Answer why or why not for each distribution. If the data do not fit any of those distributions, explain 

why. 
What happened to the shape and distribution when you averaged your data? In theory, what 

should have happened? In theory, would "it" always happen? Why or why not? 
Were the relative frequencies compared to the theoretical probabilities closer when comparing the 

X or X distributions? Explain your answer. 
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13.4.2.6 Assignment Checklist 

You need to turn in the following typed and stapled packet, with pages in the following order: 

Cover sheet: name, class time, and name of your study 

Summary pages: These should contain several paragraphs written with complete sentences that de- 
scribe the experiment, including what you studied and your sampling technique, as well as answers 
to all of the questions above. 

URL for data, if your data are from the World Wide Web. 

Pages, one for each theoretical distribution, with the distribution stated, the graph, and the proba- 
bility questions answered 
Pages of the data requested 

All graphs required 
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13.4.3 Partner Project: Hypothesis Testing - Article 6 

13.4.3.1 Student Learning Objectives 

• The student will identify a hypothesis testing problem in print. 

• The student will conduct a survey to verify or dispute the results of the hypothesis test. 

• The student will summarize the article, analysis, and conclusions in a report. 

13.4.3.2 Instructions 

As you complete each task below, check it off. Answer all questions in your summary. 

Find an article in a newspaper, magazine or on the internet which makes a claim about ONE popula- 
tion mean or ONE population proportion. The claim may be based upon a survey that the article was 
reporting on. Decide whether this claim is the null or alternate hypothesis. 

Copy or print out the article and include a copy in your project, along with the source. 

State how you will collect your data. (Convenience sampling is not acceptable.) 

Conduct your survey. You must have more than 50 responses in your sample. When you hand in 

your final project, attach the tally sheet or the packet of questionnaires that you used to collect data. 
Your data must be real. 

State the statistics that are a result of your data collection: sample size, sample mean, and sample 

standard deviation, OR sample size and number of successes. 

Make 2 copies of the appropriate solution sheet. 

Record the hypothesis test on the solution sheet, based on your experiment. Do a DRAFT solution 

first on one of the solution sheets and check it over carefully. Have a classmate check your solution 
to see if it is done correctly. Make your decision using a 5% level of significance. Include the 95% 
confidence interval on the solution sheet. 

Create a graph that illustrates your data. This may be a pie or bar chart or may be a histogram or box 

plot, depending on the nature of your data. Produce a graph that makes sense for your data and gives 
useful visual information about your data. You may need to look at several types of graphs before 
you decide which is the most appropriate for the type of data in your project. 

Write your summary (in complete sentences and paragraphs, with proper grammar and correct 

spelling) that describes the project. The summary MUST include: 

1. Brief discussion of the article, including the source. 

2. Statement of the claim made in the article (one of the hypotheses). 

3. Detailed description of how, where, and when you collected the data, including the sampling tech- 

nique. Did you use cluster, stratified, systematic, or simple random sampling (using a random 
number generator)? As stated above, convenience sampling is not acceptable. 

4. Conclusion about the article claim in light of your hypothesis test. This is the conclusion of your 

hypothesis test, stated in words, in the context of the situation in your project in sentence form, 
as if you were writing this conclusion for a non-statistician. 

5. Sentence interpreting your confidence interval in the context of the situation in your project. 

13.4.3.3 Assignment Checklist 

Turn in the following typed (12 point) and stapled packet for your final project: 

Cover sheet containing your name(s), class time, and the name of your study. 

Summary, which includes all items listed on summary checklist. 

Solution sheet neatly and completely filled out. The solution sheet does not need to be typed. 



6 This content is available online at <http://cnx.Org/content/ml7140/l.8/>. 
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Graphic representation of your data, created following the guidelines discussed above. Include only 

graphs which are appropriate and useful. 
Raw data collected AND a table summarizing the sample data (n, xbar and s; or x, n, and p', as 

appropriate for your hypotheses). The raw data does not need to be typed, but the summary does. 
Hand in the data as you collected it. (Either attach your tally sheet or an envelope containing your 
questionnaires.) 
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13 .4.4 Partner Project: Hypothesis Testing - Word Problem 7 

13.4.4.1 Student Learning Objectives 

• The student will write, edit, and solve a hypothesis testing word problem. 

13.4.4.2 Instructions 

Write an original hypothesis testing problem for either ONE population mean or ONE population propor- 
tion. As you complete each task, check it off. Answer all questions in your summary. Look at the homework 
for the Hypothesis Testing: Single Mean and Single Proportion chapter for examples (poems, two acts of a 
play, a work related problem). The problems with names attached to them are problems written by students 
in past quarters. Some other examples that are not in the homework include: a soccer hypothesis testing 
poster, a cartoon, a news reports, a children's story, a song. 

Your problem must be original and creative. It also must be in proper English. If English is difficult 

for you, have someone edit your problem. 
Your problem must be at least Vi page, typed and singled spaced. This DOES NOT include the data. 

Data will make the problem longer and that is fine. For this problem, the data and story may be real 

or fictional. 

In the narrative of the problem, make it very clear what the null and alternative hypotheses are. 

Your sample size must be LARGER THAN 50 (even if it is fictional). 

State in your problem how you will collect your data. 

Include your data with your word problem. 

State the statistics that are a result of your data collection: sample size, sample mean, and sample 

standard deviation, OR sample size and number of successes. 
Create a graph that illustrates your problem. This may be a pie or bar chart or may be a histogram 

or box plot, depending on the nature of your data. Produce a graph that makes sense for your data 

and gives useful visual information about your data. You may need to look at several types of graphs 

before you decide which is the most appropriate for your problem. 

Make 2 copies of the appropriate solution sheet. 

Record the hypothesis test on the solution sheet, based on your problem. Do a DRAFT solution first 

on one of the solution sheets and check it over carefully. Make your decision using a 5% level of 

significance. Include the 95% confidence interval on the solution 



13.4.4.3 Assignment Checklist 

You need to turn in the following typed (12 point) and stapled packet for your final project: 

Cover sheet containing your name, the name of your problem, and the date 

The problem 

Data for the problem 

Solution sheet neatly and completely filled out. The solution sheet does not need to be typed. 

Graphic representation of the data, created following the guidelines discussed above. Include only 

graphs that are appropriate and useful. 
Sentences interpreting the results of the hypothesis test and the confidence interval in the context 

of the situation in the project. 



7 This content is available online at <http://cnx.org/content/ml7144/1.7/>. 
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13.4.5 Group Project: Bivariate Data, Linear Regression, and Univariate Data 8 

13.4.5.1 Student Learning Objectives 

• The students will collect a bivariate data sample through the use of appropriate sampling techniques. 

• The student will attempt to fit the data to a linear model. 

• The student will determine the appropriateness of linear fit of the model. 

• The student will analyze and graph univariate data. 

13.4.5.2 Instructions 

1 . As you complete each task below, check it off. Answer all questions in your introduction or summary. 

2. Check your course calendar for intermediate and final due dates. 

3. Graphs may be constructed by hand or by computer, unless your instructor informs you otherwise. 
All graphs must be neat and accurate. 

4. All other responses must be done on the computer. 

5. Neatness and quality of explanations are used to determine your final grade. 

13.4.5.3 Part I: Bivariate Data 
Introduction 

State the bivariate data your group is going to study. 

EXAMPLES: Here are two examples, but you may NOT use them: height vs. weight and age 
vs. running distance. 

Describe how your group is going to collect the data (for instance, collect data from the web, survey 

students on campus). 
Describe your sampling technique in detail. Use cluster, stratified, systematic, or simple random 

sampling (using a random number generator) sampling. Convenience sampling is NOT acceptable. 

Conduct your survey. Your number of pairs must be at least 30. 

Print out a copy of your data. 



Analysis 



On a separate sheet of paper construct a scatter plot of the data. Label and scale both axes. 

State the least squares line and the correlation coefficient. 

On your scatter plot, in a different color, construct the least squares line. 

Is the correlation coefficient significant? Explain and show how you determined this. 

Interpret the slope of the linear regression line in the context of the data in your project. Relate the 

explanation to your data, and quantify what the slope tells you. 

Does the regression line seem to fit the data? Why or why not? If the data does not seem to be linear, 

explain if any other model seems to fit the data better. 

Are there any outliers? If so, what are they? Show your work in how you used the potential outlier 

formula in the Linear Regression and Correlation chapter (since you have bivariate data) to determine 

whether or not any pairs might be outliers. 



8 This content is available online at <http://cnx.org/content/ml7143/!. 6/>. 
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13.4.5.4 Part II: Univariate Data 

In this section, you will use the data for ONE variable only. Pick the variable that is more interesting to 
analyze. For example: if your independent variable is sequential data such as year with 30 years and one 
piece of data per year, your x-values might be 1971, 1972, 1973, 1974, . . ., 2000. This would not be interesting 
to analyze. In that case, choose to use the dependent variable to analyze for this part of the project. 

Summarize your data in a chart with columns showing data value, frequency, relative frequency, 

and cumulative relative frequency. 
Answer the following, rounded to 2 decimal places: 

1. Sample mean = 

2. Sample standard deviation = 

3. First quartile = 

4. Third quartile = 

5. Median = 

6. 70th percentile = 

7. Value that is 2 standard deviations above the mean = 

8. Value that is 1.5 standard deviations below the mean = 

Construct a histogram displaying your data. Group your data into 6-10 intervals of equal width. 

Pick regularly spaced intervals that make sense in relation to your data. For example, do NOT group 
data by age as 20-26,27-33,34-40,41-47,48-54,55-61 . . . Instead, maybe use age groups 19.5-24.5, 24.5- 
29.5, ... or 19.5-29.5, 29.5-39.5, 39.5-49.5, ... 

In complete sentences, describe the shape of your histogram. 

Are there any potential outliers? Which values are they? Show your work and calculations as to 

how you used the potential outlier formula in chapter 2 (since you are now using univariate data) to 
determine which values might be outliers. 

Construct a box plot of your data. 

Does the middle 50% of your data appear to be concentrated together or spread out? Explain how 

you determined this. 

Looking at both the histogram AND the box plot, discuss the distribution of your data. For example: 

how does the spread of the middle 50% of your data compare to the spread of the rest of the data rep- 
resented in the box plot; how does this correspond to your description of the shape of the histogram; 
how does the graphical display show any outliers you may have found; does the histogram show any 
gaps in the data that are not visible in the box plot; are there any interesting features of your data that 
you should point out. 



13.4.5.5 Due Dates 

• Part I, Intro: (keep a copy for your records) 

• Part I, Analysis: (keep a copy for your records) 

• Entire Project, typed and stapled: 

Cover sheet: names, class time, and name of your study. 

Part I: label the sections "Intro" and "Analysis." 

Part II: 

Summary page containing several paragraphs written in complete sentences describing the ex- 
periment, including what you studied and how you collected your data. The summary page 
should also include answers to ALL the questions asked above. 
All graphs requested in the project. 

All calculations requested to support questions in data. 

Description: what you learned by doing this project, what challenges you had, how you over- 
came the challenges. 



554 APPENDIX 

NOTE: Include answers to ALL questions asked, even if not explicitly repeated in the items 
above. 
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13.5 Solution Sheets 

13.5.1 Solution Sheet: Hypothesis Testing for Single Mean and Single Proportion 9 

Class Time: 
Name: 

a. H : 

b. H a : 

c. In words, CLEARLY state what your random variable X or P' represents. 

d. State the distribution to use for the test. 

e. What is the test statistic? 

f. What is the p-value? In 1 - 2 complete sentences, explain what the p-value means for this problem. 

g. Use the previous information to sketch a picture of this situation. CLEARLY, label and scale the horizon- 

tal axis and shade the region(s) corresponding to the p-value. 




Figure 13.1 



, Indicate the correct decision ("reject" or "do not reject" the null hypothesis), the reason for it, and write 
an appropriate conclusion, using complete sentences. 

i. Alpha: 

ii. Decision: 

iii. Reason for decision: 

iv. Conclusion: 

Construct a 95% Confidence Interval for the true mean or proportion. Include a sketch of the graph of 
the situation. Label the point estimate and the lower and upper bounds of the Confidence Interval. 




Figure 13.2 



'This content is available online at <http://cnx.org/content/ml7134/1.6/>. 
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13.5.2 Solution Sheet: Hypothesis Testing for Two Means, Paired Data, and Two 
Proportions 10 

Class Time: 
Name: 



H : 

H a : 

In words, clearly state what your random variable Xj — X-i, P\ — P%'- or Xj represents. 
State the distribution to use for the test. 
What is the test statistic? 

What is the p-value? In 1 - 2 complete sentences, explain what the p-value means for this problem. 
Use the previous information to sketch a picture of this situation. CLEARLY label and scale the horizon- 
tal axis and shade the region(s) corresponding to the p-value. 




Figure 13.3 



, Indicate the correct decision ("reject" or "do not reject" the null hypothesis), the reason for it, and write 
an appropriate conclusion, using complete sentences. 

i. Alpha: 

ii. Decision: 

iii. Reason for decision: 

iv. Conclusion: 

In complete sentences, explain how you determined which distribution to use. 



"This content is available online at <http://cnx.org/content/ml7133/1.6/>. 
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13.5.3 Solution Sheet: The Chi-Square Distribution 

Class Time: 
Name: 



ii 



H : 

H a : 

What are the degrees of freedom? 
State the distribution to use for the test. 
What is the test statistic? 

What is the p-value? In 1 - 2 complete sentences, explain what the p-value means for this problem. 
Use the previous information to sketch a picture of this situation. Clearly label and scale the horizontal 
axis and shade the region(s) corresponding to the p-value. 




Figure 13.4 



h. Indicate the correct decision ("reject" or "do not reject" the null hypothesis) and write appropriate con- 
clusions, using complete sentences. 

i. Alpha: 

ii. Decision: 

iii. Reason for decision: 

iv. Conclusion: 



lr rhis content is available online at <http://cnx.org/content/ml7136/1.5/>. 



558 APPENDIX 

13.5.4 Solution Sheet: F Distribution and ANOVA 12 

Class Time: 

Name: 

a. H : 

b. H a : 

c. df(n) = 

d. df {d) = 

e. State the distribution to use for the test. 

f . What is the test statistic? 

g. What is the p-value? In 1 - 2 complete sentences, explain what the p-value means for this problem. 

h. Use the previous information to sketch a picture of this situation. Clearly label and scale the horizontal 
axis and shade the region(s) corresponding to the p-value. 



< > 



Figure 13.5 



Indicate the correct decision ("reject" or "do not reject" the null hypothesis) and write appropriate con- 
clusions, using complete sentences. 

i. Alpha: 

ii. Decision: 

iii. Reason for decision: 

iv. Conclusion: 



2 This content is available online at <http://cnx.org/content/ml7135/1.5/>. 



APPENDIX 



559 



13.6 English Phrases Written Mathematically 13 
13.6.1 English Phrases Written Mathematically 



When the English says: 


Interpret this as: 






Xis at least 4. 


X >4 


XThe minimum is 4. 


X >4 


X is no less than 4. 


X >4 


X is greater than or equal to 4. 


X >4 






X is at most 4. 


X <4 


XThe maximum is 4. 


X <4 


Xis no more than 4. 


X <4 


X is less than or equal to 4. 


X <4 


Xdoes not exceed 4. 


X <4 






Xis greater than 4. 


X >4 


XThere are more than 4. 


X >4 


Xexceeds 4. 


X >4 






Xis less than 4. 


X <4 


XThere are fewer than 4. 


X <4 






Xis 4. 


X = 4 


Xis equal to 4. 


X = 4 


Xis the same as 4. 


X = 4 






Xis not 4. 


X/4 


Xis not equal to 4. 


X ^4 


Xis not the same as 4. 


X/4 


Xis different than 4. 


X ^4 







Table 13.16 



3 This content is available online at <http://cnx.org/content/ml6307/1.5/>. 
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13.7 Symbols and their Meanings 

Symbols and their Meanings 



Chapter (1st used) 


Symbol 


Spoken 


Meaning 










Sampling and Data 


V 


The square root of 


same 


Sampling and Data 


TC 


Pi 


3.14159. . . (a specific 
number) 


Descriptive Statistics 


Qi 


Quartile one 


the first quartile 


Descriptive Statistics 


Q2 


Quartile two 


the second quartile 


Descriptive Statistics 


Q3 


Quartile three 


the third quartile 


Descriptive Statistics 


IQR 


inter-quartile range 


Q3-Q1=IQR 


Descriptive Statistics 


X 


x-bar 


sample mean 


Descriptive Statistics 


¥ 


mu 


population mean 


Descriptive Statistics 


b b % OjC 


s 


sample standard devia- 
tion 


Descriptive Statistics 


2 2 

* 4 


s-squared 


sample variance 


Descriptive Statistics 


a a x ax 


sigma 


population standard 
deviation 


Descriptive Statistics 


2 2 

cr PJ 


sigma-squared 


population variance 


Descriptive Statistics 


E 


capital sigma 


sum 


Probability Topics 


{} 


brackets 


set notation 


Probability Topics 


S 


S 


sample space 


Probability Topics 


A 


Event A 


event A 


Probability Topics 


P(A) 


probability of A 


probability of A occur- 
ring 


Probability Topics 


P(A\B) 


probability of A given B 


prob. of A occurring 
given B has occurred 


Probability Topics 


P(AorB) 


prob. of A or B 


prob. of A or B or both 
occurring 


continued on next page 



4 This content is available online at <http://cnx.org/content/ml6302/1.9/>. 
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Probability Topics 


P(AandB) 


prob. of A and B 


prob. of both A and B 
occurring (same time) 


Probability Topics 


A' 


A-prime, complement 
of A 


complement of A, not A 


Probability Topics 


P(A') 


prob. of complement of 
A 


same 


Probability Topics 


G x 


green on first pick 


same 


Probability Topics 


P(Gi) 


prob. of green on first 
pick 


same 


Discrete Random Vari- 
ables 


PDF 


prob. distribution func- 
tion 


same 


Discrete Random Vari- 
ables 


X 


X 


the random variable X 


Discrete Random Vari- 
ables 


X ~ 


the distribution of X 


same 


Discrete Random Vari- 
ables 


B 


binomial distribution 


same 


Discrete Random Vari- 
ables 


G 


geometric distribution 


same 


Discrete Random Vari- 
ables 


H 


hypergeometric dist. 


same 


Discrete Random Vari- 
ables 


P 


Poisson dist. 


same 


Discrete Random Vari- 
ables 


A 


Lambda 


average of Poisson dis- 
tribution 


Discrete Random Vari- 
ables 


> 


greater than or equal to 


same 


Discrete Random Vari- 
ables 


< 


less than or equal to 


same 


Discrete Random Vari- 
ables 


= 


equal to 


same 


Discrete Random Vari- 
ables 


¥= 


not equal to 


same 


continued on next page 
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Continuous Random 
Variables 


f(x) 


fofx 


function of x 


Continuous Random 
Variables 


pdf 


prob. density function 


same 


Continuous Random 
Variables 


U 


uniform distribution 


same 


Continuous Random 
Variables 


Exp 


exponential distribu- 
tion 


same 


Continuous Random 
Variables 


k 


k 


critical value 


Continuous Random 
Variables 


/(*) = 


f of x equals 


same 


Continuous Random 
Variables 


m 


m 


decay rate (for exp. 
dist.) 


The Normal Distribu- 
tion 


N 


normal distribution 


same 


The Normal Distribu- 
tion 


z 


z-score 


same 


The Normal Distribu- 
tion 


Z 


standard normal dist. 


same 


The Central Limit The- 
orem 


CLT 


Central Limit Theorem 


same 


The Central Limit The- 
orem 


X 


X-bar 


the random variable X- 
bar 


The Central Limit The- 
orem 


Y-x 


mean of X 


the average of X 


The Central Limit The- 
orem 


V-x 


mean of X-bar 


the average of X-bar 


The Central Limit The- 
orem 


o- x 


standard deviation of X 


same 


The Central Limit The- 
orem 


v% 


standard deviation of 
X-bar 


same 


The Central Limit The- 
orem 


EX 


sum of X 


same 


continued on next page 
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The Central Limit The- 
orem 


Ex 


sum of x 


same 


Confidence Intervals 


CL 


confidence level 


same 


Confidence Intervals 


CI 


confidence interval 


same 


Confidence Intervals 


EBM 


error bound for a mean 


same 


Confidence Intervals 


EBP 


error bound for a pro- 
portion 


same 


Confidence Intervals 


t 


student-t distribution 


same 


Confidence Intervals 


df 


degrees of freedom 


same 


Confidence Intervals 


t« 

2 


student-t with a/2 area 
in right tail 


same 


Confidence Intervals 


A 

v'v 


p-prime; p-hat 


sample proportion of 
success 


Confidence Intervals 


A 


q-prime; q-hat 


sample proportion of 
failure 


Hypothesis Testing 


Ho 


H-naught, H-sub 


null hypothesis 


Hypothesis Testing 


H a 


H-a, H-sub a 


alternate hypothesis 


Hypothesis Testing 


Hi 


H-l, H-sub 1 


alternate hypothesis 


Hypothesis Testing 


a 


alpha 


probability of Type I er- 
ror 


Hypothesis Testing 


J8 


beta 


probability of Type II 
error 


Hypothesis Testing 


XT-X2 


Xl-bar minus X2-bar 


difference in sample 
means 




Fi ~F2 


mu-1 minus mu-2 


difference in popula- 
tion means 




P'i ~ P'i 


Pl-prime minus P2- 
prime 


difference in sample 
proportions 




Pi -P2 


pi minus p2 


difference in popula- 
tion proportions 


Chi-Square Distribu- 
tion 


X 2 


Ky-square 


Chi-square 


continued on next page 



564 



APPENDIX 





O 


Observed 


Observed frequency 




E 


Expected 


Expected frequency 


Linear Regression and 
Correlation 


y = a + bx 


y equals a plus b-x 


equation of a line 




A 

y 


y-hat 


estimated value of y 




r 


correlation coefficient 


same 




£ 


error 


same 




SSE 


Sum of Squared Errors 


same 




1.9s 


1.9 times s 


cut-off value for out- 
liers 


F-Distribution and 
ANOVA 


F 


F-ratio 


F ratio 



Table 13.17 
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13.8 Formulas 15 

Formula 13.1: Factorial 
n\ — n(n-l) (n - 2) ... (1) 

0! = 1 

Formula 13.2: Combinations 

/n\ _ n! 

vr/ (n-r)'.r'. 

Formula 13.3: Binomial Distribution 
X- B(n,p) 

P(X = x) = (")p x q"- x , for x = 0,1,2,..., n 

Formula 13.4: Geometric Distribution 

X~G(p) 

P(X = x) = q x ~ 1 p , for x= 1,2,3,... 

Formula 13.5: Hypergeometric Distribution 

X~H(r,b,n) 

Formula 13.6: Poisson Distribution 

X~P(^) 



P(X = x) 

Formula 13.7: Uniform Distribution 

X- U(a,b) 

f(X) = ^- a ,a<x<b 

Formula 13.8: Exponential Distribution 

X ~ Exp (m) 

f (x) = me~ mx , m > 0,x > 

Formula 13.9: Normal Distribution 

X- N (}l,cr 2 ) 

f (x) = -,= e 2cr 2 — OO < X < OO 

J y ' crV2n 

Formula 13.10: Gamma Function 

r (z) = J °° x z ~ 1 e- x dx z > 

r(|) = ^ 

r (m + 1) — ml for m, a nonnegative integer 

otherwise: Y (a + 1) — aT (a) 
Formula 13.11: Student-t Distribution 

5 This content is available online at <http://cnx.Org/content/ml6301/l.7/>. 
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VnnT(^) 



X=^L 



Z ~ N (0, 1) , Y ~ XJL ,n = degrees of freedom 
Formula 13.12: Chi-Square Distribution 

if 

n-1 -X 

f (x) = x « e 2 , x > , n = positive integer and degrees of freedom 

v 2ir(f) r 

Formula 13.13: F Distribution 

X ~ Fdf(n),df(d) 

df (n) =degrees of freedom for the numerator 
df (d) =degrees of freedom for the denominator 

/W = f^|y(l) ! * (| - l) [ 1 + (S) I "" 5( " +, ' ) ] 

X = yff- , Y ,W are chi-square 
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13.9 Notes for the TI-83, 83+, 84 Calculator 16 

13.9.1 Quick Tips 
Legend 

• I J represents a button press 



• [ ] represents yellow command or green letter behind a key 

• < > represents items on the screen 

To adjust the contrast 

Press J , then hold ^b^B to increase the contrast or VA^B to decrease the contrast. 

To capitalize letters and words 

Press - J to get one capital letter, or press - J , then . J to set all button presses to capital 

letters. You can return to the top-level button values by pressing . J again. 

To correct a mistake ^^^^ 

If you hit a wrong button, just hit UflUiJ and start again. 

To write in scientific notation 

Numbers in scientific notation are expressed on the TI-83, 83+, and 84 using E notation, such that... 

• 4.321 E 4 = 4.321 x 10 4 

• 4.321 E -4 = 4.321 x 10" 4 

To transfer programs or equations from one calculator to another: 

Both calculators: Insert your respective end of the link cable cable and press J , then [LINK] . 

Calculator receiving information: 

Step 1. Use the arrows to navigate to and select <RECEIVE> 
Step 2. Press \jZU3j 

Calculator sending information: 

Step 1 . Press appropriate number or letter. 

Step 2. Use up and down arrows to access the appropriate item. 

Step 3. Press U^uJ to select item to transfer. 

Step 4. Press right arrow to navigate to and select <TRANSMIT>. 

Step 5. Press yyED 

NOTE: EPvROR 35 LINK generally means that the cables have not been inserted far enough. 



Both calculators: Insert your respective end of the link cable cable Both calculators: press J , then 

[QUIT] To exit when done. 



6 This content is available online at <http://cnx.org/content/ml9710/1.6/>. 
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13.9.2 Manipulating One- Variable Statistics 

NOTE: These directions are for entering data with the built-in statistical program. 

Sample Data 



Data 


Frequency 


-2 


10 


-1 


3 





4 


1 


5 


3 


8 



Table 13.18: We are manipulating 1-variable statistics. 



To begin: 



Step 1 . Turn on the calculator. 



Step 2. Access statistics mode. 



Step 3. Select <4 : ClrList > to clear data from lists, if desired. 

4 i flfina 

Step 4. Enter list [LI] to be cleared. 

j,[li],GM3 

Step 5. Display last instruction. 

. J, [ENTRY] 

Step 6. Continue clearing remaining lists in the same fashion, if desired. 

KM J,[L2],GM3 

Step 7. Access statistics mode. 



Step 8. Select <1: Edit . . .> 

Step 9. Enter data. Data values go into [LI] . (You may need to arrow over to [LI] ) 

• Type in a data value and enter it. (For negative numbers, use the negate (-) key at the bottom of 
the keypad) 

JzU_LJ,GM3 

• Continue in the same manner until all data values are entered. 
Step 10. In [L2] , enter the frequencies for each data value in [LI] . 

• Type in a frequency and enter it. (If a data value appears only once, the frequency is "1") 

4 1 OT31 

• Continue in the same manner until all data values are entered. 
Step 11. Access statistics mode. 
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Step 12. Navigate to <CALC> 
Step 13. Access <1: 1-var Stats> 

Step 14. Indicate that the data is in [LI] ... 

J, [li] SB 

Step 15. ...and indicate that the frequencies are in [L2] . 

J,[ L2 ],GM3 

Step 16. The statistics should be displayed. You may arrow down to get remaining statistics. Repeat as neces- 
sary. 

13.9.3 Drawing Histograms 

NOTE: We will assume that the data is already entered 

We will construct 2 histograms with the built-in STATPLOT application. The first way will use the default 
ZOOM. The second way will involve customizing a new graph. 

Step 1. Access graphing mode. 

. J, [STAT PLOT] 

Step 2. Select <1 :plot 1> To access plotting - first graph. 

Step 3. Use the arrows navigate go to <0N> to turn on Plot 1. 

<o N >,(32DB 

Step 4. Use the arrows to go to the histogram picture and select the histogram. 

Anna 

Step 5. Use the arrows to navigate to <Xlist> 
Step 6. If "LI" is not selected, select it. 

J,[Li],GHD 

Step 7. Use the arrows to navigate to <Freq>. 
Step 8. Assign the frequencies to [L2] . 

J,[L2],(H33 

Step 9. Go back to access other graphs. 

J, [STAT PLOT] 

Step 10. Use the arrows to turn off the remaining plots. 

Step 11. Be sure to deselect or clear all equations before graphing. 

To deselect equations: 

Step 1. Access the list of equations. 



Step 2. Select each equal sign (=). 

Step 3. Continue, until all equations are deselected. 
To clear equations: 



570 APPENDIX 

Step 1. Access the list of equations. 



Step 2. Use the arrow keys to navigate to the right of each equal sign (=) and clear them. 

Step 3. Repeat until all equations are deleted. 

To draw default histogram: 

Step 1. Access the ZOOM menu. 

Step 2. Select <9:ZoomSt at > 

Step 3. The histogram will show with a window automatically set. 
To draw custom histogram: 



Step 1. Access UliUill to set the graph parameters. 
Step 2. • X m i„ = —2.5 

• X-max = •3-3 

• Xscl = 1 (width of bars) 

• '■mm " 

• * max = tU 

• Yscl = 1 (spacing of tick marks on y-axis) 

• X r es = 1 

Step 3. Access ViLUllI to see the histogram. 

To draw box plots: 

Step 1. Access graphing mode. 

J, [STAT PLOT] 

Step 2. Select < 1 : Plot 1 > to access the first graph. 

Step 3. Use the arrows to select <0N> and turn on Plot 1. 

Step 4. Use the arrows to select the box plot picture and enable it. 

Step 5. Use the arrows to navigate to <Xlist> 
Step 6. If "LI" is not selected, select it. 

j,[li],GM3 

Step 7. Use the arrows to navigate to <Freq>. 
Step 8. Indicate that the frequencies are in [L2] . 

J,[ L2 ],GM3 

Step 9. Go back to access other graphs. 

. J, [STAT PLOT] 

Step 10. Be sure to deselect or clear all equations before graphing using the method mentioned above. 
Step 11. View the box plot. 

FMJm , [STAT PLOT] 
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13.9 A Linear Regression 

13.9.4.1 Sample Data 

The following data is real. The percent of declared ethnic minority students at De Anza College for selected 
years from 1970 - 1995 was: 



Year 


Student Ethnic Minority Percentage 


1970 


14.13 


1973 


12.27 


1976 


14.08 


1979 


18.16 


1982 


27.64 


1983 


28.72 


1986 


31.86 


1989 


33.14 


1992 


45.37 


1995 


53.1 



Table 13.19: The independent variable is "Year," while the independent variable is "Student Ethnic Minority 

Percent." 



Student Ethnic Minority Percentage 
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Figure 13.6: By hand, verify the scatterplot above. 
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NOTE: The TI-83 has a built-in linear regression feature, which allows the data to be edited. The 
x-values will be in [LI] ; the y-values in [L2] . 

To enter data and do linear regression: 

Step 1 . ON Turns calculator on 



LI 

Step 2. Before accessing this program, be sure to turn off all plots. 

• Access graphing mode. 

J, [STAT PLOT] 

• Turn off all plots. 

4 i ntini 

Step 3. Round to 3 decimal places. To do so: 

• Access the mode menu. 
GHHIj. [STAT PLOT] 

• Navigate to <Float> and then to the right to <3>. 



• All numbers will be rounded to 3 decimal places until changed. 
Step 4. Enter statistics mode and clear lists [LI] and [L2] , as describe above. 

I57E1 4 i 

Step 5. Enter editing mode to insert values for x and y. 

ESI QM3 

Step 6. Enter each value. Press UllUtf to continue. 

To display the correlation coefficient: 

Step 1. Access the catalog. 
. J, [CATALOG] 



Step 2. Arrow down and select <DiagnosticOn> 

Step 3. r and r 2 will be displayed during regression calculations. 
Step 4. Access linear regression. 



Step 5. Select the form of y = a + bx 

8 i flfini 

The display will show: 
LinReg 

• y = a + bx 

• a = -3176.909 

• b = 1.617 

• r 2 = 0.924 

• r = 0.961 



APPENDIX 573 

This means the Line of Best Fit (Least Squares Line) is: 

• y = -3176.909 + 1.617a; 

• Percent - -3176.909 + 1.617(year #) 

The correlation coefficient r = 0.961 
To see the scatter plot: 

Step 1. Access graphing mode. 

. J, [STAT PLOT] 

Step 2. Select <1 :plot 1> To access plotting - first graph. 

Step 3. Navigate and select <0N> to turn on Plot 1. 

<on>GM3 

Step 4. Navigate to the first picture. 
Step 5. Select the scatter plot. 

Step 6. Navigate to <Xlist> 

Step 7. If [LI] is not selected, press J , [LI] to select it. 

Step 8. Confirm that the data values are in [LI] . 

<o N >GM3 

Step 9. Navigate to <Ylist> 
Step 10. Select that the frequencies are in [L2] . 

J,[ L2 ],(3M3 

Step 11. Go back to access other graphs. 

. J, [STAT PLOT] 

Step 12. Use the arrows to turn off the remaining plots. 



Step 13. Access UUUiil to set the graph parameters. 

• X min = 1970 

• X mflI = 2000 

• X sc i = 10 (spacing of tick marks on x-axis) 

• Y miH = -0.05 

• imax = 60 

• Yscl = 10 (spacing of tick marks on y-axis) 

• X r es = 1 

Step 14. Be sure to deselect or clear all equations before graphing, using the instructions above. 
Step 15. Press ViLuuJ to see the scatter plot. 

To see the regression graph: 

Step 1. Access the equation menu. The regression equation will be put into Yl. 



Step 2. Access the vars menu and navigate to <5 : Statistics> 

FETE1 5 I 

Step 3. Navigate to <EQ>. 

Step 4. < 1 : RegEQ > contains the regression equation which will be entered in Yl . 
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Step 5. Press |£LuilJ . The regression line will be superimposed over scatter plot. 

To see the residuals and use them to calculate the critical point for an outlier: 

Step 1. Access the list. RESID will be an item on the menu. Navigate to it. 
J, [LIST], <RESID> 



Step 2. Confirm twice to view the list of residuals. Use the arrows to select them. 

otto fiflni 

Step 3. The critical point for an outlier is: 1.9V^4 where: 

• n = number of pairs of data 

• SSE = sum of the squared errors 

• X] residual 

Step 4. Store the residuals in [L3] . 

ED J,[L3],GM3 

Step 5. Calculate the illl^fL. Note that n - 2 = 8 

J,[L3],«»,«»,_8_J 

Step 6. Store this value in [L4] . 

GE3 J,[L4],GM3 

Step 7. Calculate the critical value using the equation above. 

^LJ,^J,^_J,^M J,[v] J, msn ■*,■*, _LJ J 

Step 8. Verify that the calculator displays: 7.642669563. This is the critical value. 

Step 9. Compare the absolute value of each residual value in [L3] to 7.64 . If the absolute value is greater 
than 7.64, then the (x, y) corresponding point is an outlier. In this case, none of the points is an outlier. 

To obtain estimates of y for various x-values: 

There are various ways to determine estimates for "y". One way is to substitute values for "x" in the 

equation. Another way is to use the IL^m on the graph of the regression line. 

13.9.5 TI-83, 83+, 84 instructions for distributions and tests 
13.9.5.1 Distributions 

Access DISTR (for "Distributions"). 

For technical assistance, visit the Texas Instruments website at http://www.ti.com 17 and enter your calcu- 
lator model into the "search" box. 

Binomial Distribution 

• binompdf (n , p , x) corresponds to P(X = x) 

• binomcdf (n , p , x) corresponds to P(X < x) 

• To see a list of all probabilities for x: 0, 1, . . . , n, leave off the "x" parameter. 

Poisson Distribution 

• poissonpdf ( A , x) corresponds to P(X = x) 

• poissoncdf (A,x) corresponds to P(X < x) 



7 http: / /www. ti.com 
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Continuous Distributions (general) 

• — oo uses the value -1EE99 for left bound 

• oo uses the value 1EE99 for right bound 

Normal Distribution 

• normalpdf (x,}i,a) yields a probability density function value (only useful to plot the normal curve, 
in which case "x" is the variable) 

• normalcdf (left bound, right bound, ft , a) corresponds to P(left bound < X < right bound) 

• normalcdf (left bound, right bound) corresponds to P(left bound < Z < right bound) - standard 
normal 

• invNorm (p , \i , a) yields the critical value, k: P(X < k) = p 

• invNorm(p) yields the critical value, k: P(Z < k) = p for the standard normal 

Student-t Distribution 

• tpdf (x,df) yields the probability density function value (only useful to plot the student-t curve, in 
which case "x" is the variable) 

• tcdf(left bound, right bound, df) corresponds to P(left bound < t < right bound) 

Chi-square Distribution 

• X 2 pdf (x , df ) yields the probability density function value (only useful to plot the chi 2 curve, in which 
case "x" is the variable) 

• X 2 cdf(left bound, right bound, df) corresponds to P(left bound < X 2 < right bound) 

F Distribution 

• Fpdf (x , df num, df denom) yields the probability density function value (only useful to plot the F curve, 
in which case "x" is the variable) 

• Fcdf(left bound, right bound, df num, df denom) corresponds to P(left bound < F < right bound) 

13.9.5.2 Tests and Confidence Intervals 

Access STAT and TESTS. 

For the Confidence Intervals and Hypothesis Tests, you may enter the data into the appropriate lists and 
press DATA to have the calculator find the sample means and standard deviations. Or, you may enter the 
sample means and sample standard deviations directly by pressing STAT once in the appropriate tests. 

Confidence Intervals 

• ZInterval is the confidence interval for mean when <r is known 

• TInterval is the confidence interval for mean when <r is unknown; s estimates c. 

• 1-PropZInt is the confidence interval for proportion 

NOTE: The confidence levels should be given as percents (ex. enter "95" for a 95% confidence 
level). 

Hypothesis Tests 

• Z-Test is the hypothesis test for single mean when a is known 

• T-Test is the hypothesis test for single mean when a is unknown; s estimates cr. 

• 2-SampZTest is the hypothesis test for 2 independent means when both cr's are known 
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• 2-SampTTest is the hypothesis test for 2 independent means when both c's are unknown 

• 1-PropZTest is the hypothesis test for single proportion. 

• 2-PropZTest is the hypothesis test for 2 proportions. 

• X 2 -Test is the hypothesis test for independence. 

• X 2 G0F-Test is the hypothesis test for goodness-of-fit (TI-84+ only). 

• LinRegTTEST is the hypothesis test for Linear Regression (TI-84+ only). 



NOTE: Input the null hypothesis value in the row below "Inpt." For a test of a single mean, "^0" 
represents the null hypothesis. For a test of a single proportion, "p0" represents the null hypothe- 
sis. Enter the alternate hypothesis on the bottom row. 
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Solutions to Exercises in Chapter 13 

Solutions to Practice Final Exam 1 

Solution to Exercise 13.1.1 (p. 523) 

B: Independent. 

Solution to Exercise 13.1.2 (p. 523) 

*— 16 

Solution to Exercise 13.1.3 (p. 523) 

B: Two measurements are drawn from the same pair of individuals or objects. 

Solution to Exercise 13.1.4 (p. 524) 

B- ~^- 

°- 118 

Solution to Exercise 13.1.5 (p. 524) 

Ul 52 

Solution to Exercise 13.1.6 (p. 524) 

D - 40 

Solution to Exercise 13.1.7 (p. 524) 

B: 2.78 

Solution to Exercise 13.1.8 (p. 525) 
A: 8.25 

Solution to Exercise 13.1.9 (p. 525) 
C: 0.2870 

Solution to Exercise 13.1.10 (p. 525) 
C: Normal 

Solution to Exercise 13.1.11 (p. 525) 
D: H a : p A / p B 

Solution to Exercise 13.1.12 (p. 525) 

B: believe that the pass rate for Math 1A is different than the pass rate for Math IB when, in fact, the pass 
rates are the same. 
Solution to Exercise 13.1.13 (p. 526) 
B: not reject H 

Solution to Exercise 13.1.14 (p. 526) 
C: Iris 

Solution to Exercise 13.1.15 (p. 526) 
C: Student-t 

Solution to Exercise 13.1.16 (p. 527) 
B: is left-tailed 

Solution to Exercise 13.1.17 (p. 527) 
C: cluster sampling 
Solution to Exercise 13.1.18 (p. 527) 
C: Mode 

Solution to Exercise 13.1.19 (p. 527) 

A: the probability that an outcome of the data will happen purely by chance when the null hypothesis is 
true. 

Solution to Exercise 13.1.20 (p. 528) 
D: stratified 

Solution to Exercise 13.1.21 (p. 528) 
B:25 

Solution to Exercise 13.1.22 (p. 528) 
C:4 

Solution to Exercise 13.1.23 (p. 528) 
A: (1.85, 2.32) 
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Solution to Exercise 13.1.24 (p. 528) 

C: Both above are correct. 

Solution to Exercise 13.1.25 (p. 529) 

C:5.8 

Solution to Exercise 13.1.26 (p. 529) 

C: 0.6321 

Solution to Exercise 13.1.27 (p. 529) 

A: 0.8413 

Solution to Exercise 13.1.28 (p. 529) 

A: (0.6030, 0.7954) 

Solution to Exercise 13.1.29 (p. 529) 

A:N ( 145 '7Io) 

Solution to Exercise 13.1.30 (p. 530) 
D: 3.66 

Solution to Exercise 13.1.31 (p. 530) 
B:5.1 

Solution to Exercise 13.1.32 (p. 530) 
A: 13.46 

Solution to Exercise 13.1.33 (p. 530) 

B: There is a strong linear pattern. Therefore, it is most likely a good model to be used. 
Solution to Exercise 13.1.34 (p. 531) 
B: Chi 2 3 

Solution to Exercise 13.1.35 (p. 531) 
D:70 

Solution to Exercise 13.1.36 (p. 531) 

B: The choice of major and the gender of the student are not independent of each other. 
Solution to Exercise 13.1.37 (p. 531) 
A: Chi goodness of fit 

Solutions to Practice Final Exam 2 

Solution to Exercise 13.2.1 (p. 532) 

B: parameter 

Solution to Exercise 13.2.2 (p. 532) 

A 

Solution to Exercise 13.2.3 (p. 533) 

C:7 

Solution to Exercise 13.2.4 (p. 533) 

C: 0.02 

Solution to Exercise 13.2.5 (p. 533) 

C: none of the above 

Solution to Exercise 13.2.6 (p. 533) 

LJ - 140 

Solution to Exercise 13.2.7 (p. 534) 
A:w0 

Solution to Exercise 13.2.8 (p. 534) 
B: The values for x are: {1,2,3, ..., 14} 
Solution to Exercise 13.2.9 (p. 534) 
C: 0.9417 

Solution to Exercise 13.2.10 (p. 534) 
D: Binomial 
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Solution to Exercise 13.2.11 (p. 535) 

D:8.7 

Solution to Exercise 13.2.12 (p. 535) 

A: -1.96 

Solution to Exercise 13.2.13 (p. 535) 

A: 0.6321 

Solution to Exercise 13.2.14 (p. 536) 

D:360 

Solution to Exercise 13.2.15 (p. 536) 

B:N ( 72 'S) 

Solution to Exercise 13.2.16 (p. 536) 

A- 5 

A. 9 

Solution to Exercise 13.2.17 (p. 536) 

(D) 

Solution to Exercise 13.2.18 (p. 537) 
B:5.5 

Solution to Exercise 13.2.19 (p. 537) 
D: 6.92 

Solution to Exercise 13.2.20 (p. 537) 
A: 5 

Solution to Exercise 13.2.21 (p. 537) 
B: 0.8541 

Solution to Exercise 13.2.22 (p. 538) 
B:0.2 

Solution to Exercise 13.2.23 (p. 538) 
A:-l 

Solution to Exercise 13.2.24 (p. 538) 
C: dependent groups 
Solution to Exercise 13.2.25 (p. 538) 
D: Reject H . There is a difference in the mean scores. 
Solution to Exercise 13.2.26 (p. 538) 

C: The proportion for males is higher than the proportion for females. 
Solution to Exercise 13.2.27 (p. 539) 
B:No 

Solution to Exercise 13.2.28 (p. 539) 
B: p-value is close to 1. 
Solution to Exercise 13.2.29 (p. 539) 
B:No 

Solution to Exercise 13.2.30 (p. 539) 
C: y = 79.96x - 0.0094 
Solution to Exercise 13.2.31 (p. 540) 
D: We should not be estimating here. 
Solution to Exercise 13.2.32 (p. 540) 
A 

Solution to Exercise 13.2.33 (p. 540) 
B: The p-value is < 0.01 , the distribution is uniform. 
Solution to Exercise 13.2.34 (p. 540) 
C: The test is to determine if the different groups have the same averages. 
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Tables 



NOTE: When you are finished with the table link, use the back button on your browser to return 
here. 

Tables (NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, 
January 3, 2009) 

• Student-t table 2 

• Normal table 3 

• Chi-Square table 4 

• F-table 5 

• All four tables can be accessed by going to http://www.itl.nist.gov/div898/handbook/eda/section3/eda367.htm 6 

95% Critical Values of the Sample Correlation Coefficient Table 

• 95% Critical Values of the Sample Correlation Coefficient 7 

NOTE: The url for this table is http://cnx.org/content/ml7098/latest/ 



x This content is available online at <http://cnx.org/content/ml9138/13/>. 
2 http:// www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm 
3 http://www.itl.nist.gov/div898/handbook/eda/section3/eda3671.htm 
4 http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm 
5 http:// www.itl.nist.gov/div898/handbook/eda/section3/eda3673. htm 
6 http:// www.itl.nist.gov/div898/handbook/eda/section3/eda367.htm 
7 http://cnx.org/content/ml7098/latest/ 
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Glossary 



A Analysis of Variance 

Also referred to as ANOVA. A method of testing whether or not the means of three or more 
populations are equal. The method is applicable if: 

• All populations of interest are normally distributed. 

• The populations have equal standard deviations. 

• Samples (not necessarily of the same size) are randomly and independently selected from 
each population. 

The test statistic for analysis of variance is the F-ratio. 

Average 

A number that describes the central tendency of the data. There are a number of specialized 
averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean. 

B Bernoulli Trials 

An experiment with the following characteristics: 

• There are only 2 possible outcomes called "success" and "failure" for each trial. 

• The probability p of a success is the same for any trial (so the probability q = 1 — p of a 
failure is the same for any trial). 

Binomial Distribution 

A discrete random variable (RV) which arises from Bernoulli trials. There are a fixed number, n, 
of independent trials. "Independent" means that the result of any trial (for example, trial 1) 
does not affect the results of the following trials, and all trials are conducted under the same 
conditions. Under these circumstances the binomial RV X is defined as the number of successes 
in n trials. The notation is: X~B {n, p). The mean is ]i = np and the standard deviation is 
c = ^Jnpq. The probability of exactly x successes in n trials is P (X = x) = (") p x q n ~ x . 

C Central Limit Theorem 

Given a random variable (RV) with known mean \i and known standard deviation a. We are 
sampling with size n and we are interested in two new RVs - the sample mean, X, and the 

sample sum, ZX. If the size n of the sample is sufficiently large, then X<~ N ( ]i, -j= ) and EX ~ 

N (nji, \/ncr) . If the size n of the sample is sufficiently large, then the distribution of the sample 
means and the distribution of the sample sums will approximate a normal distribution 
regardless of the shape of the population. The mean of the sample means will equal the 
population mean and the mean of the sample sums will equal n times the population mean. 
The standard deviation of the distribution of the sample means, -y=, is called the standard error 

of the mean. 
Coefficient of Correlation 
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A measure developed by Karl Pearson (early 1900s) that gives the strength of association 
between the independent variable and the dependent variable. The formula is: 

r = wExy-(E*)(Ey) (3 7) 



[nEx 2 -(Ex) 2 ] [«Ey 2 -(Ey) : 



where n is the number of data points. The coefficient cannot be more then 1 and less then -1. 
The closer the coefficient is to ±1, the stronger the evidence of a significant linear relationship 
between x and y. 

Conditional Probability 

The likelihood that an event will occur given that another event has already occurred. 
Confidence Interval (CI) 

An interval estimate for an unknown population parameter. This depends on: 

• The desired confidence level. 

• Information that is known about the distribution (for example, known standard deviation). 

• The sample and its size. 

Confidence Level (CL) 

The percent expression for the probability that the confidence interval contains the true 
population parameter. For example, if the CL = 90%, then in 90 out of 100 samples the interval 
estimate will enclose the true population parameter. 

Contingency Table 

The method of displaying a frequency distribution as a table with rows and columns to show 
how two variables may be dependent (contingent) upon each other. The table provides an easy 
way to calculate conditional probabilities. 

Continuous Random Variable 

A random variable (RV) whose outcomes are measured. 

Example: The height of trees in the forest is a continuous RV. 

Cumulative Relative Frequency 

The term applies to an ordered set of observations from smallest to largest. The Cumulative 
Relative Frequency is the sum of the relative frequencies for all values that are less than or equal 
to the given value. 



D Data 



A set of observations (a set of possible outcomes). Most data can be put into two groups: 
qualitative (hair color, ethnic groups and other attributes of the population) and quantitative 
(distance traveled to college, number of children in a family, etc.). Quantitative data can be 
separated into two subgroups: discrete and continuous. Data is discrete if it is the result of 
counting (the number of students of a given ethnic group in a class, the number of books on a 
shelf, etc.). Data is continuous if it is the result of measuring (distance traveled, weight of 
luggage, etc.) 

Degrees of Freedom (df) 

The number of objects in a sample that are free to vary. 
Discrete Random Variable 
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A random variable (RV) whose outcomes are counted. 

E Equally Likely 

Each outcome of an experiment has the same probability. 

Error Bound for a Population Mean (EBM) 

The margin of error. Depends on the confidence level, sample size, and known or estimated 
population standard deviation. 

Error Bound for a Population Proportion(EBP) 

The margin of error. Depends on the confidence level, sample size, and the estimated (from the 
sample) proportion of successes. 

Event 

A subset in the set of all outcomes of an experiment. The set of all outcomes of an experiment is 
called a sample space and denoted usually by S. An event is any arbitrary subset in S. It can 
contain one outcome, two outcomes, no outcomes (empty subset), the entire sample space, etc. 
Standard notations for events are capital letters such as A, B, C, etc. 

Expected Value 

Expected arithmetic average when an experiment is repeated many times. (Also called the 
mean). Notations: E (x) , \i. For a discrete random variable (RV) with probability distribution 
function P (x) ,the definition can also be written in the form E (x) — \i — VJ xP (x) . 

Experiment 

A planned activity carried out under controlled conditions. 

Exponential Distribution 

A continuous random variable (RV) that appears when we are interested in the intervals of time 
between some random events, for example, the length of time between emergency arrivals at a 
hospital. Notation: X~Exp (m). The mean is ]i — ^ and the standard deviation is a = ^-. The 
probability density function is / (x) = me _mx , x > and the cumulative distribution function 
is P (X < x) = 1 - e _mx . 

F Frequency 

The number of times a value of the data occurs. 

H Hypothesis 

A statement about the value of a population parameter. In case of two hypotheses, the statement 
assumed to be true is called the null hypothesis (notation Hq) and the contradictory statement is 
called the alternate hypothesis (notation H a ). 

Hypothesis Testing 

Based on sample evidence, a procedure to determine whether the hypothesis stated is a 
reasonable statement and cannot be rejected, or is unreasonable and should be rejected. 

I Independent Events 

The occurrence of one event has no effect on the probability of the occurrence of any other event. 
Events A and B are independent if one of the following is true: (1). P (A\B) = P (A) ; (2) 
P (B\A) =P(B); (3) P (AandB) = P (A) P (B). 

Inferential Statistics 
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Also called statistical inference or inductive statistics. This facet of statistics deals with 
estimating a population parameter based on a sample statistic. For example, if 4 out of the 100 
calculators sampled are defective we might infer that 4 percent of the production is defective. 

L Level of Significance of the Test 

Probability of a Type I error (reject the null hypothesis when it is true). Notation: a. In 
hypothesis testing, the Level of Significance is called the preconceived a or the preset a. 

M Mean 

A number that measures the central tendency. A common name for mean is 'average.' The term 
'mean' is a shortened form of 'arithmetic mean.' By definition, the mean for a sample (denoted 

, _i . _ Sum of all values in the sample , .i c ii-/j a j i \ • 

by x) is x = Numberofvaluesinthesam F ple , and the mean for a population (denoted by p) is 

Sum of all values in the population 

" Number of values in the population ' 

Median 

A number that separates ordered data into halves. Half the values are the same number or 
smaller than the median and half the values are the same number or larger than the median. 
The median may or may not be part of the data. 

Mode 

The value that appears most frequently in a set of data. 

Mutually Exclusive 

An observation cannot fall into more than one class (category). Being in more than one category 
prevents being in a mutually exclusive category. 

N Normal Distribution 

A continuous random variable (RV) with pdf f(x) = — -Lg-l* - ^) 2 /2a 2 , where ji is the mean of 

the distribution and a is the standard deviation. Notation: X ~ N (ji, a) . If \i = and a — 1, the 
RV is called the standard normal distribution. 



O Outcome (observation) 

A particular result of an experiment. 
Outlier 

An observation that does not fit the rest of the data. 

P p-value 

The probability that an event will happen purely by chance assuming the null hypothesis is true. 
The smaller the p-value, the stronger the evidence is against the null hypothesis. 

Parameter 

A numerical characteristic of the population. 
Point Estimate 

A single number computed from a sample and used to estimate a population parameter. 
Population 
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The collection, or set, of all individuals, objects, or measurements whose properties are being 
studied. 

Probability 

A number between and 1, inclusive, that gives the likelihood that a specific event will occur. 
The foundation of statistics is given by the following 3 axioms (by A. N. Kolmogorov, 1930's): 
Let S denote the sample space and A and B are two events in S . Then: 

•0 <P(A) < 1;. 

• If A and B are any two mutually exclusive events, then P (AorB) = P (A) + P (B). 

• P(S) = 1. 

Probability Distribution Function (PDF) 

A mathematical description of a discrete random variable (RV), given either in the form of an 
equation (formula) , or in the form of a table listing all the possible outcomes of an experiment 
and the probability associated with each outcome. 

Example: A biased coin with probability 0.7 for a head (in one toss of the coin) is tossed 5 times. 
We are interested in the number of heads (the RV X = the number of heads). X is Binomial, so 

X ~ B (5,0.7) and P (X = x) = 



.7 X .3 5 x or in the form of the table: 



Proportion 



X 


P(X = x) 





0.0024 


1 


0.0284 


2 


0.1323 


3 


0.3087 


4 


0.3602 


5 


0.1681 



Table 5.3 



• As a number: A proportion is the number of successes divided by the total number in the 
sample. 

• As a probability distribution: Given a binomial random variable (RV), X ~B (n, p), consider 
the ratio of the number X of successes in n Bernouli trials to the number n of trials. P' — „■ 
This new RV is called a proportion, and if the number of trials, n, is large enough, P' 
~N(p,H). 



Q Qualitative Data 
See Data. 
Quantitative Data 

R Random Variable (RV) 
see Variable 
Relative Frequency 
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The ratio of the number of times a value of the data occurs in the set of all outcomes to the 
number of all outcomes. 

S Sample 

A portion of the population understudy. A sample is representative if it characterizes the 
population being studied. 

Sample Space 

The set of all possible outcomes of an experiment. 

Standard Deviation 

A number that is equal to the square root of the variance and measures how far data values are 
from their mean. Notation: s for sample standard deviation and a for population standard 
deviation. 

Standard Error of the Mean 

The standard deviation of the distribution of the sample means, -7= . 

Standard Normal Distribution 

A continuous random variable (RV) X~N (0, 1) .. When X follows the standard normal 
distribution, it is often noted as Z~N (0, 1). 

Statistic 

A numerical characteristic of the sample. A statistic estimates the corresponding population 
parameter. For example, the average number of full-time students in a 7:30 a.m. class for this 
term (statistic) is an estimate for the average number of full-time students in any class this term 
(parameter). 

Student's-t Distribution 

Investigated and reported by William S. Gossett in 1908 and published under the pseudonym 
Student. The major characteristics of the random variable (RV) are: 

• It is continuous and assumes any real values. 

• The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at 
the apex than the normal distribution. 

• It approaches the standard normal distribution as n gets larger. 

• There is a "family" of t distributions: every representative of the family is completely 
defined by the number of degrees of freedom which is one less than the number of data. 

Student-t Distribution 

T Tree Diagram 

The useful visual representation of a sample space and events in the form of a "tree" with 
branches marked by possible outcomes simultaneously with associated probabilities 
(frequencies, relative frequencies). 

Type 1 Error 

The decision is to reject the Null hypothesis when, in fact, the Null hypothesis is true. 
Type 2 Error 

The decision is to not reject the Null hypothesis when, in fact, the Null hypothesis is false. 

U Uniform Distribution 
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A continuous random variable (RV) that has equally likely outcomes over the domain, 

a < x < b. Often referred as the Rectangular distribution because the graph of the pdf has the 

form of a rectangle. Notation: X~U (a, b). The mean is ]i = ^-^- and the standard deviation is 

c = Y - t 2 The probability density function is / (x) = j^ for a<x<bora<x<b. The 
cumulative distribution is P (X < x) = |5§ ■ 

V Variable (Random Variable) 

A characteristic of interest in a population being studied. Common notation for variables are 
upper case Latin letters X, Y, Z,...; common notation for a specific value from the domain (set of 
all possible values of a variable) are lower case Latin letters x, y, z,.... For example, if X is the 
number of children in a family, then x represents a specific integer 0, 1, 2, 3, .... Variables in 
statistics differ from variables in intermediate algebra in two following ways. 

• The domain of the random variable (RV) is not necessarily a numerical set; the domain may 
be expressed in words; for example, if X = hair color then the domain is (black, blond, gray, 
green, orange}. 

• We can tell what specific value x of the Random Variable X takes only after performing the 
experiment. 

Variance 

Mean of the squared deviations from the mean. Square of the standard deviation. For a set of 
data, a deviation can be represented as x — x where x is a value of the data and x is the sample 
mean. The sample variance is equal to the sum of the squares of the deviations divided by the 
difference of the sample size and 1. 

Venn Diagram 

The visual representation of a sample space and events in the form of circles or ovals showing 
their intersections. 

Z z-score 

The linear transformation of the form z = ^-^- . If this transformation is applied to any normal 
distribution X~N (ji, a) , the result is the standard normal distribution Z~N (0, 1). If this 
transformation is applied to any specific value x of the RV with mean \i and standard deviation 
u , the result is called the z-score of x. Z-scores allow us to compare data that are normally 
distributed but scaled differently. 
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Index of Keywords and Terms 

Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords 
do not necessarily appear in the text of the page. They are merely associated with that section. Ex. 
apples, § 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1 



" "hypothesis testing.", 361 

A A AND B, § 4.2(158) 
A OR B, § 4.2(158) 
accessibility, § (5) 
addition, § 4.4(163) 
additional, § (5) 
adoption, § (5) 

alternate hypothesis, § 12.2(502), § 12.3(502) 
ANOVA, § 12.1(501), § 12.2(502), 502, 
§ 12.3(502), § 12.4(504), § 13.5.4(558) 
answer, § 1.7(20) 
appendix, § 13.3(541) 
article, § 13.4.3(549) 
average, § 1.4(15), § 5.3(205) 

B bar, §2.4(49) 

Bernoulli, § 5.5(208), § 5.9(217) 

Bernoulli Trial, 209 

binomial, § 5.4(208), § 5.5(208), § 5.6(212), 

§ 5.9(217) 

binomial distribution, 365 

binomial probability distribution, 209 

bivariate, § 13.4.5(552) 

box, §2.9(68), §2.11(72) 

boxes, § 2.4(49) 

C cards, §5.11(230) 
categorical, § 1.4(15) 
central, § 13.4.2(546) 

Central Limit Theorem, § 7.2(274), § 7.3(277), 
§ 7.10(303), 371 
chance, § 4.2(158), § 4.3(160) 
chi,§ 11.4(458), §11.5(465) 
chi-square, § 13.5.3(557), § 14(581) 
CLT, 278 

cluster, §1.9(25), §1.13(39) 
collaborative, § (1), § (5) 
collection, 1 
condition, § 4.3(160) 
conditional, § 4.2(158), § 4.11(180) 
conditional probability, 159 



confidence interval, 312, 320 

confidence intervals, 323, 361 

confidence level, 313, 323 

contingency, § 4.5(167), § 4.9(177) 

contingency table, 167, 465 

continuity correction factor, 283 

Continuous, § 1.5(17), 17, § 1.9(25), § 1.11(29), 

§ 13.4.2(546) 

convenience, § 1.9(25) 

Counting, §1.5(17) 

critical value, 251 

cumulative, § 1.8(21), § 1.9(25), § 1.11(29), 

§ 1.12(37) 

Cumulative relative frequency, 21 

curve, § 12.4(504) 

D Data, § 1.1(13), § 1.2(13), 13, § 1.4(15), 16, 
§ 1.5(17), § 1.6(18), § 1.9(25), § 1.10(26), 
§ 1.11(29), § 1.12(37), § 2.1(45), § 2.2(45), 
§ 2.4(49), § 13.3(541), § 13.4.1(544), 
§ 13.4.5(552), § 13.5.2(556) 
degrees of freedom, 320, § 12.3(502), 
§ 12.4(504) 

degrees of freedom (df), 415 
descriptive, § 1.2(13), § 2.2(45), § 2.3(46), 
§2.9(68), §2.11(72) 
deviation, § 2.9(68), § 2.11(72) 
diagram, § 4.6(170), § 4.7(171) 
dice, §5.12(234) 

Discrete, § 1.5(17), 17, § 1.9(25), § 1.11(29), 
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Module: "Descriptive Statistics: Histogram" 

Used here as: "Histograms" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6298/L13/ 

Pages: 49-53 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Descriptive Statistics: Measuring the Center of the Data" 

Used here as: "Measures of the Center of the Data" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7102/Lll/ 

Pages: 53-56 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Descriptive Statistics: Skewness and the Mean, Median, and Mode" 

Used here as: "Skewness and the Mean, Median, and Mode" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7104/L9/ 

Pages: 56-58 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 



ATTRIBUTIONS 597 

Module: "Descriptive Statistics: Measuring the Spread of the Data" 

Used here as: "Measures of the Spread of the Data" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7103/l.14/ 

Pages: 58-66 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Descriptive Statistics: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6310/l.9/ 

Page: 67 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Descriptive Statistics: Practice 1" 

Used here as: "Practice 1: Center of the Data" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6312/L12/ 

Pages: 68-70 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Descriptive Statistics: Practice 2" 

Used here as: "Practice 2: Spread of the Data" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7105/Lll/ 

Page: 71 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Descriptive Statistics: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6801/L24/ 

Pages: 72-88 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Descriptive Statistics: Descriptive Statistics Lab" 

Used here as: "Lab: Descriptive Statistics" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6299/L13/ 

Pages: 89-90 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Linear Regression and Correlation: Introduction" 

Used here as: "Linear Regression and Correlation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7089/L5/ 

Page: 97 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Linear Equations" 

Used here as: "Linear Equations" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7086/L4/ 

Pages: 97-99 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Slope and Y-Intercept of a Linear Equation" 

Used here as: "Slope and Y-Intercept of a Linear Equation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7083/L5/ 

Page: 99 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Scatter Plots" 

Used here as: "Scatter Plots" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7082/L6/ 

Pages: 100-101 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: The Regression Equation" 

Used here as: "The Regression Equation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7090/L14/ 

Pages: 102-107 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Linear Regression and Correlation: Correlation Coefficient and Coefficient of Determination" 

Used here as: "Correlation Coefficient and Coefficient of Determination" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7092/LU/ 

Pages: 108-109 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Linear Regression and Correlation: Testing the Significance of the Correlation Coefficient" 

Used here as: "Testing the Significance of the Correlation Coefficient" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7077/l.14/ 

Pages: 110-114 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Linear Regression and Correlation: Prediction" 

Used here as: "Prediction" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7095/l.7/ 

Page: 115 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Linear Regression and Correlation: Outliers" 

Used here as: "Outliers" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7094/L13/ 

Pages: 115-121 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Linear Regression and Correlation: 95% Critical Values of the Sample Correlation Coefficient 

Table" 

Used here as: "95% Critical Values of the Sample Correlation Coefficient Table" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7098/L5/ 

Pages: 122-124 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Summary" 

Used here as: "Summary" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7081/L4/ 

Page: 125 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Practice" 

Used here as: "Practice: Linear Regression" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7088/L8/ 

Pages: 126-128 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 
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Module: "Linear Regression and Correlation: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7085/l.13/ 

Pages: 129-143 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Linear Regression and Correlation: Regression Lab I" 

Used here as: "Lab 1: Regression (Distance from School)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7080/L10/ 

Pages: 144-146 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Regression Lab II" 

Used here as: "Lab 2: Regression (Textbook Cost)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7087/L9/ 

Pages: 147-148 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Linear Regression and Correlation: Regression Lab III" 

Used here as: "Lab 3: Regression (Fuel Efficiency)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7079/L8/ 

Pages: 149-151 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Probability Topics: Introduction" 

Used here as: "Probability Topics" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6838/Lll/ 

Pages: 157-158 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Terminology" 

Used here as: "Terminology" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6845/L13/ 

Pages: 158-160 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 



ATTRIBUTIONS 601 

Module: "Probability Topics: Independent & Mutually Exclusive Events" 

Used here as: "Independent and Mutually Exclusive Events" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.Org/content/ml6837/l.14/ 

Pages: 160-163 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Two Basic Rules of Probability" 

Used here as: "Two Basic Rules of Probability" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6847/l.U/ 

Pages: 163-167 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Contingency Tables" 

Used here as: "Contingency Tables" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6835/l.12/ 

Pages: 167-170 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Venn Diagrams (optional)" 

Used here as: "Venn Diagrams (optional)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6848/l.12/ 

Pages: 170-171 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Tree Diagrams (optional)" 

Used here as: "Tree Diagrams (optional)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6846/l.10/ 

Pages: 171-175 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Probability Topics: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6843/l.5/ 

Page: 176 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Probability Topics: Practice" 

Used here as: "Practice 1: Contingency Tables" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6839/l.U/ 

Pages: 177-178 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Practice II" 

Used here as: "Practice 2: Calculating Probabilities" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6840/l.12/ 

Page: 179 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6836/l.21/ 

Pages: 180-190 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Probability Topics: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6842/L9/ 

Pages: 191-192 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Probability Topics: Probability Lab" 

Used here as: "Lab: Probability Topics" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6841/L15/ 

Pages: 193-195 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Discrete Random Variables: Introduction" 

Used here as: "Discrete Random Variables" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6825/L14/ 

Pages: 203-204 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Discrete Random Variables: Probability Distribution Function (PDF) for a Discrete Random Vari- 
able" 

Used here as: "Probability Distribution Function (PDF) for a Discrete Random Variable" 
By: Susan Dean, Barbara Illowsky Ph.D. 
URL: http://cnx.Org/content/ml6831/l.14/ 
Pages: 204-205 

Copyright: Maxfield Foundation 
License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Mean or Expected Value and Standard Deviation" 

Used here as: "Mean or Expected Value and Standard Deviation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6828/l.16/ 

Pages: 205-208 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Common Discrete Probability Distribution Functions" 

Used here as: "Common Discrete Probability Distribution Functions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6821/L6/ 

Page: 208 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Binomial" 

Used here as: "Binomial" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6820/L16/ 

Pages: 208-211 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Summary of the Discrete Probability Functions" 

Used here as: "Summary of Functions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6833/L10/ 

Pages: 212-213 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Practice 1: Discrete Distributions" 

Used here as: "Practice 1: Discrete Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6830/L14/ 

Page: 214 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Discrete Random Variables: Practice 2: Binomial Distribution" 

Used here as: "Practice 2: Binomial Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7107/l.18/ 

Pages: 215-216 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6823/L20/ 

Pages: 217-226 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6832/l.ll/ 

Pages: 227-229 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Lab I" 

Used here as: "Lab 1: Discrete Distribution (Playing Card Experiment)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6827/L12/ 

Pages: 230-233 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Discrete Random Variables: Lab II" 

Used here as: "Lab 2: Discrete Distribution (Lucky Dice Experiment)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6826/L12/ 

Pages: 234-237 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Normal Distribution: Introduction" 

Used here as: "The Normal Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6979/L12/ 

Pages: 245-246 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Normal Distribution: Standard Normal Distribution" 

Used here as: "The Standard Normal Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6986/l.7/ 

Page: 246 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Normal Distribution: Z-scores" 

Used here as: "Z-scores" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6991/l.9/ 

Pages: 247-248 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Normal Distribution: Areas to the Left and Right of x" 

Used here as: "Areas to the Left and Right of x" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6976/l.5/ 

Page: 249 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Normal Distribution: Calculations of Probabilities" 

Used here as: "Calculations of Probabilities" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6977/L12/ 

Pages: 249-252 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Normal Distribution: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6987/L5/ 

Page: 253 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Normal Distribution: Practice" 

Used here as: "Practice: The Normal Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6983/L10/ 

Pages: 254-255 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Normal Distribution: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6978/L20/ 

Pages: 256-261 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Normal Distribution: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6985/L10/ 

Pages: 262-263 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Normal Distribution: Normal Distribution Lab I" 

Used here as: "Lab 1: Normal Distribution (Lap Times)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6981/L18/ 

Pages: 264-266 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Normal Distribution: Normal Distribution Lab II" 

Used here as: "Lab 2: Normal Distribution (Pinkie Length)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6980/L16/ 

Pages: 267-268 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Introduction" 

Used here as: "The Central Limit Theorem" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6953/L17/ 

Pages: 273-274 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Central Limit Theorem for Sample Means" 

Used here as: "The Central Limit Theorem for Sample Means (Averages)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6947/L23/ 

Pages: 274-276 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Central Limit Theorem: Central Limit Theorem for Sums" 

Used here as: "The Central Limit Theorem for Sums" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6948/l.16/ 

Pages: 277-278 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Using the Central Limit Theorem" 

Used here as: "Using the Central Limit Theorem" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6958/L21/ 

Pages: 278-285 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6956/L8/ 

Page: 286 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Practice" 

Used here as: "Practice: The Central Limit Theorem" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6954/L12/ 

Pages: 287-289 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6952/L24/ 

Pages: 290-296 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6955/L12/ 

Pages: 297-298 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Central Limit Theorem: Central Limit Theorem Lab I" 

Used here as: "Lab 1: Central Limit Theorem (Pocket Change)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6950/l.10/ 

Pages: 299-302 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Central Limit Theorem: Central Limit Theorem Lab II" 

Used here as: "Lab 2: Central Limit Theorem (Cookie Recipes)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6945/l.U/ 

Pages: 303-307 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Introduction" 

Used here as: "Confidence Intervals" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6967/L16/ 

Pages: 311-313 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Confidence Interval, Single Population Mean, Population Standard Devia- 
tion Known, Normal" 

Used here as: "Confidence Interval, Single Population Mean, Population Standard Deviation Known, Nor- 
mal" 

By: Susan Dean, Barbara Illowsky, Ph.D. 
URL: http://cnx.org/content/ml6962/L23/ 
Pages: 313-320 

Copyright: Maxfield Foundation 
License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Confidence Interval, Single Population Mean, Standard Deviation Un- 
known, Student' s-t" 

Used here as: "Confidence Interval, Single Population Mean, Standard Deviation Unknown, Student-T" 
By: Susan Dean, Barbara Illowsky, Ph.D. 
URL: http://cnx.org/content/ml6959/L24/ 
Pages: 320-323 

Copyright: Maxfield Foundation 
License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Confidence Interval for a Population Proportion" 

Used here as: "Confidence Interval for a Population Proportion" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6963/L20/ 

Pages: 323-327 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Confidence Intervals: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.Org/content/ml6973/l.8/ 

Page: 328 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Practice 1" 

Used here as: "Practice 1: Confidence Intervals for Averages, Known Population Standard Deviation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6970/l.13/ 

Pages: 329-330 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Practice 2" 

Used here as: "Practice 2: Confidence Intervals for Averages, Unknown Population Standard Deviation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6971/L14/ 

Pages: 331-332 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Practice 3" 

Used here as: "Practice 3: Confidence Intervals for Proportions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6968/L13/ 

Pages: 333-334 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6966/L16/ 

Pages: 335-344 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6972/L10/ 

Pages: 345-347 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Confidence Intervals: Confidence Interval Lab I" 

Used here as: "Lab 1: Confidence Interval (Home Costs)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6960/l.ll/ 

Pages: 348-350 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Confidence Interval Lab II" 

Used here as: "Lab 2: Confidence Interval (Place of Birth)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6961/l.U/ 

Pages: 351-352 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Confidence Intervals: Confidence Interval Lab III" 

Used here as: "Lab 3: Confidence Interval (Womens' Heights)" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6964/L12/ 

Pages: 353-354 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Introduction" 

Used here as: "Hypothesis Testing: Single Mean and Single Proportion" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml6997/l.ll/ 

Pages: 361-362 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Null and Alternate Hypotheses" 

Used here as: "Null and Alternate Hypotheses" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6998/L14/ 

Pages: 362-363 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Outcomes and the Type I and Type II 

Errors" 

Used here as: "Outcomes and the Type I and Type II Errors" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7006/L8/ 

Pages: 363-364 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Hypothesis Testing of Single Mean and Single Proportion: Distribution Needed for Hypothesis 

Testing" 

Used here as: "Distribution Needed for Hypothesis Testing" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.Org/content/ml7017/l.13/ 

Pages: 364-365 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Assumptions" 

Used here as: "Assumption" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7002/L16/ 

Page: 365 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Rare Events" 

Used here as: "Rare Events" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6994/L8/ 

Page: 365 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Using the Sample to Test the Null 

Hypothesis" 

Used here as: "Using the Sample to Support One of the Hypotheses" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6995/L17/ 

Pages: 366-367 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Decision and Conclusion" 

Used here as: "Decision and Conclusion" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6992/Lll/ 

Page: 367 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Additional Information" 

Used here as: "Additional Information" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6999/L13/ 

Pages: 367-368 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Hypothesis Testing of Single Mean and Single Proportion: Summary of the Hypothesis Test" 

Used here as: "Summary of the Hypothesis Test" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.Org/content/ml6993/l.6/ 

Page: 369 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Examples" 

Used here as: "Examples" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7005/L25/ 

Pages: 369-379 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6996/L9/ 

Page: 380 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Practice 1" 

Used here as: "Practice 1: Single Mean, Known Population Standard Deviation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7004/Lll/ 

Pages: 381-382 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Practice 2" 

Used here as: "Practice 2: Single Mean, Unknown Population Standard Deviation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7016/L12/ 

Pages: 383-384 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Practice 3" 

Used here as: "Practice 3: Single Proportion" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7003/L15/ 

Pages: 385-386 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Hypothesis Testing of Single Mean and Single Proportion: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.Org/content/ml7001/l.14/ 

Pages: 387-399 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7013/l.12/ 

Pages: 400-402 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Single Mean and Single Proportion: Lab" 

Used here as: "Lab: Hypothesis Testing of a Single Mean and Single Proportion" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7007/L12/ 

Pages: 403-406 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Introduction" 

Used here as: "Hypothesis Testing: Two Population Means and Two Population Proportions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7029/L9/ 

Pages: 413-414 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Comparing Two 
Independent Population Means with Unknown Population Standard Deviations" 

Used here as: "Comparing Two Independent Population Means with Unknown Population Standard Devi- 
ations" 

By: Susan Dean, Barbara Illowsky, Ph.D. 
URL: http://cnx.org/content/ml7025/L18/ 
Pages: 414-417 

Copyright: Maxfield Foundation 
License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Comparing Two 
Independent Population Means with Known Population Standard Deviations" 

Used here as: "Comparing Two Independent Population Means with Known Population Standard Devia- 
tions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 
URL: http://cnx.org/content/ml7042/L10/ 
Pages: 417-419 

Copyright: Maxfield Foundation 
License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Comparing Two 

Independent Population Proportions" 

Used here as: "Comparing Two Independent Population Proportions" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.Org/content/ml7043/l.12/ 

Pages: 419-421 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Matched or Paired 

Samples" 

Used here as: "Matched or Paired Samples" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7033/l.15/ 

Pages: 421-425 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Summary of Types 

of Hypothesis Tests" 

Used here as: "Summary of Types of Hypothesis Tests" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7044/L5/ 

Page: 426 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Practice 1" 

Used here as: "Practice 1: Hypothesis Testing for Two Proportions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7027/L13/ 

Pages: 427-428 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing: Two Population Means and Two Population Proportions: Practice 2" 

Used here as: "Practice 2: Hypothesis Testing for Two Averages" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7039/L12/ 

Pages: 429-430 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Two Means and Two Proportions: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7023/L21/ 

Pages: 431-442 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "Hypothesis Testing of Two Means and Two Proportions: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7021/l.9/ 

Pages: 443-444 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Hypothesis Testing of Two Means and Two Proportions: Lab I" 

Used here as: "Lab: Hypothesis Testing for Two Means and Two Proportions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7022/L13/ 

Pages: 445-449 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Introduction" 

Used here as: "The Chi-Square Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7048/L9/ 

Pages: 455-456 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Notation" 

Used here as: "Notation" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7052/L6/ 

Page: 456 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Facts About The Chi-Square Distribution" 

Used here as: "Facts About the Chi-Square Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7045/L6/ 

Pages: 456-457 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Goodness-of-Fit Test" 

Used here as: "Goodness-of-Fit Test" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7192/L8/ 

Pages: 458-465 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "The Chi-Square Distribution: Test of Independence" 

Used here as: "Test of Independence" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7191/l.12/ 

Pages: 465-469 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Summary of Formulas" 

Used here as: "Summary of Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7058/L8/ 

Page: 470 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Practice 1" 

Used here as: "Practice 1: Goodness-of-Fit Test" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7054/L12/ 

Pages: 471-472 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Practice 2" 

Used here as: "Practice 2: Contingency Tables" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7056/L12/ 

Pages: 473-474 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7028/L20/ 

Pages: 475-483 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7057/L10/ 

Pages: 484-487 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "The Chi-Square Distribution: Lab I" 

Used here as: "Lab 1: Chi-Square Goodness-of-Fit" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7049/l.9/ 

Pages: 488-492 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "The Chi-Square Distribution: Lab II" 

Used here as: "Lab 2: Chi-Square Test for Independence" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7050/l.U/ 

Pages: 493-494 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "F Distribution and ANOVA: Introduction" 

Used here as: "F Distribution and ANOVA" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7065/l.7/ 

Page: 501 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "F Distribution and ANOVA: Purpose and Basic Assumption of ANOVA" 

Used here as: "ANOVA" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7068/l.6/ 

Page: 502 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "F Distribution and ANOVA: The F Distribution And The F Ratio" 

Used here as: "The F Distribution and the F Ratio" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7076/l.9/ 

Pages: 502-504 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "F Distribution and ANOVA: Facts About the F Distribution" 

Used here as: "Facts About the F Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7062/l.U/ 

Pages: 504-508 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 
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Module: "F Distribution and ANOVA: Summary" 

Used here as: "Summary" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.org/content/ml7072/L3/ 

Page: 509 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "F Distribution and ANOVA: Practice" 

Used here as: "Practice: ANOVA" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7067/L8/ 

Pages: 510-511 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "F Distribution and ANOVA: Homework" 

Used here as: "Homework" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7063/L9/ 

Pages: 512-513 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "F Distribution and ANOVA: Review" 

Used here as: "Review" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7070/L8/ 

Pages: 514-517 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "F Distribution and ANOVA: ANOVA Lab" 

Used here as: "Lab: ANOVA" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7061/L8/ 

Pages: 518-519 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Practice Final Exam 1" 

Used here as: "Practice Final Exam 1" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6304/L16/ 

Pages: 523-531 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 
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Module: "Collaborative Statistics: Practice Final Exam 2" 

Used here as: "Practice Final Exam 2" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6303/L15/ 

Pages: 532-540 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Data Sets" 

Used here as: "Data Sets" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7132/L5/ 

Pages: 541-543 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Collaborative Statistics: Projects: Univariate Data" 

Used here as: "Group Project: Univariate Data" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7142/L8/ 

Pages: 544-545 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Projects: Continuous Distributions & Central Limit Theorem" 

Used here as: "Group Project: Continuous Distributions and Central Limit Theorem" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7141/L9/ 

Pages: 546-548 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Projects: Hypothesis Testing Article" 

Used here as: "Partner Project: Hypothesis Testing - Article" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7140/L8/ 

Pages: 549-550 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Collaborative Statistics: Projects: Hypothesis Testing Word Problem" 

Used here as: "Partner Project: Hypothesis Testing - Word Problem" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7144/L7/ 

Page: 551 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 
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Module: "Collaborative Statistics: Projects: Bivariate Data, Linear Regression and Univariate Data" 

Used here as: "Group Project: Bivariate Data, Linear Regression, and Univariate Data" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.Org/content/ml7143/l.6/ 

Pages: 552-554 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Solution Sheets: Hypothesis Testing: Single Mean and Single Proportion" 

Used here as: "Solution Sheet: Hypothesis Testing for Single Mean and Single Proportion" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7134/L6/ 

Page: 555 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Solution Sheets: Hypothesis Testing: Two Means, Paired Data, Two 

Proportions" 

Used here as: "Solution Sheet: Hypothesis Testing for Two Means, Paired Data, and Two Proportions" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7133/L6/ 

Page: 556 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Solution Sheets: The Chi-Square Distribution" 

Used here as: "Solution Sheet: The Chi-Square Distribution" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7136/L5/ 

Page: 557 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Solution Sheets: F Distribution and ANOVA" 

Used here as: "Solution Sheet: F Distribution and ANOVA" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml7135/L5/ 

Page: 558 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: English Phrases Written Mathematically" 

Used here as: "English Phrases Written Mathematically" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6307/L5/ 

Page: 559 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 
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Module: "Collaborative Statistics: Symbols and their Meanings" 

Used here as: "Symbols and their Meanings" 

By: Susan Dean, Barbara Illowsky Ph.D. 

URL: http://cnx.org/content/ml6302/L9/ 

Pages: 560-564 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /2.0/ 

Module: "Collaborative Statistics: Formulas" 

Used here as: "Formulas" 

By: Susan Dean, Barbara Illowsky, Ph.D. 

URL: http://cnx.org/content/ml6301/L7/ 

Pages: 565-566 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Collaborative Statistics: Notes for the TI-83, 83+, 84 Calculator" 

Used here as: "Notes for the TI-83, 83+, 84 Calculator" 

By: Barbara Illowsky, Ph.D., Susan Dean 

URL: http://cnx.org/content/ml9710/L6/ 

Pages: 567-576 

Copyright: Maxfield Foundation 

License: http: / / creativecommons.org/licenses/by /3.0/ 

Module: "Tables" 

By: Susan Dean 

URL: http://cnx.org/content/ml9138/L3/ 

Page: 581 

Copyright: Susan Dean 

License: http: / / creativecommons.org/licenses/by /2.0/ 



Collaborative Statistics for MT230 

Collaborative Statistics was written by Barbara Illowsky and Susan Dean, faculty members at De Anza 
College in Cupertino, California. The textbook was developed over several years and has been used in 
regular and honors-level classroom settings and in distance learning classes. This textbook is intended for 
introductory statistics courses being taken by students at two- and four-year colleges who are majoring in 
fields other than math or engineering. Intermediate algebra is the only prerequisite. The book focuses on 
applications of statistical knowledge rather than the theory behind it. 
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